Predicting Pest Occurrence with AI: An Overview

February 20, 2025

Throughout history, farmers protected their crop yields from pests and diseases based on their wits, experience, and pure intuition. The time of year and recent weather conditions gave them some idea of what to look out for when visiting their fields. While this farmer’s intuition has undeniable merit, in the current technological age it makes a lot of sense to benefit from Artificial Intelligence (AI) to find the interaction between weather conditions and pest occurrences.

As part of the EFRA project and in collaboration with AGRIVI, our researchers at Wageningen Food Safety Research (WFSR) are developing pest occurrence models using AI. AI correctly identified pest risks, allowing for healthier crop yields and more efficient pesticide usage, leading to increasing food safety and food security. This article provides a high-level introduction to the approaches and challenges that come with our AI development.

Decision tree: the basic case

In most AI problems, by far the biggest driving factor in the decision-making process is the data at hand. The big and complex AI models that are so popular nowadays are usually trained using millions or even billions of examples. In the case of our pest occurrence models, a dataset of this size is not at all realistic; for every data point we need localized weather information and pest observations, which takes a lot of time and manual labor to obtain. This is why we usually look for AI types that are optimal for small datasets, such as decision trees. A decision tree takes all the training data we have and identifies relations between data and pest occurrence using a tree-like structure of rules. A simplified example of a decision tree in our case is: “If the temperature is above X degrees and humidity is below Y percent, then we predict a higher risk of a pest occurrence”.

LSTM: taking time into account

While a decision tree works well, it has one clear flaw for pest occurrence prediction. The decision tree only takes into account data at a given timepoint for its prediction. As any agricultural expert will tell you, this is not representative of reality; whether there are pests on a crop is not only impacted by today’s weather, but rather by the weather behavior over a longer period of time. If there is one day of rain after weeks of dryness, these previous weeks are of course vital information. In other words, our AI benefits greatly from this additional information. Time-series AI uses multiple time points. A good example of a time-series AI is the Long Short-Term Memory (LSTM) algorithm, a deep learning algorithm that keeps memory of data at multiple time points to improve its predictions. A downside of an LSTM is that it needs substantially more data, which is not always available for predicting pest observations.

Generating new data

Another challenge with our data is missing observations as the collection of observations may not be collected systematically. Moreover, with changing climates the data in the future is different from the data now. This is why we experimented with generative AI to fill these gaps in the data. This generative AI learns the inner workings of the data so that it can generate novel weather and pest observation data. This new data can make new, more complete, data to obtain better AI. Using this generative AI impactfully improved our results.

Take home message

There is no one-fits-all solution to AI projects, even ones as specific as predicting pest occurrence based on weather data. It is always good to keep all the methods and tricks at our disposal in mind when developing an AI for complex data.

Article originally posted on Medium.