By Stockholm University & Agroknow.
This report presents the setup, results, and analysis of SemEval-2025 Task 9, which challenged participants to classify food safety incidents based on real-world web texts. It describes the two subtasks, along with dataset structure, evaluation metrics, and successful modeling strategies. Key findings highlight the role of synthetic data, ensemble models, and transformer diversity in tackling long-tail classification under realistic constraints.
Key Highlights:
Two subtasks: predicting hazard and product categories (ST1) and specific labels (ST2)
Dataset: 6,644 manually labeled food recall reports from official agencies (2012–2022)
Models evaluated on macro F1, with hazard detection weighted most
Synthetic data from LLMs improved rare class performance
Top systems used ensemble methods and domain-augmented inputs
No single transformer architecture outperformed others across tasks