Multi-Source Sensor ML Pipeline: Pest Population Forecasting
We ran an eleven-model tournament across regression and classification; Random Forest reached F1 0.667 and recall 1.00 on a heavily imbalanced minority class.
245
Samples
11
Models compared
10.67:1
Class imbalance
1.00
Minority recall
Eleven-model tournament across regression and classification
We cleaned and merged a multi-source dataset across 5 monitoring sites: 245 daily samples combining meteorological readings with entomological catch counts. After harmonising inconsistent field names and mixed date formats, we engineered lag, 3-day rolling-mean, recency, and calendar features.
The question split into two: a regression task (predict tomorrow's catch count) and a binary classification task (will there be a catch event at all). We ran both as tournaments so the model choice was driven by held-out test performance, not intuition.
TimeSeriesSplit CV, class-weight balancing, threshold tuning
- Regression bracket (6 models): ARIMAX, SARIMAX, Prophet, Random Forest, XGBoost, LightGBM. All evaluated under TimeSeriesSplit cross-validation.
- Classification bracket (5 models): Random Forest, XGBoost, LightGBM, LSTM, GRU. The 10.67:1 class imbalance was handled with
class_weight='balanced'and per-model F1-optimal threshold tuning on a held-out chronological validation slice. - Champion selection: lowest test MAE on regression, highest test F1 on classification. Random Forest won both.
Random Forest takes both brackets
The two charts below compare every model on its own test set. Random Forest in cyan; baselines in slate.
Regression tournament
Test MAE across the 6 regressors (lower is better)
Random Forest hit Test MAE 0.34 against ARIMAX's baseline 2.00, about an 83% reduction.
Classification tournament
F1 and AUC across the 5 classifiers (higher is better)
Random Forest wins overall on F1 (0.67) and posts a strong AUC 0.92. XGBoost narrowly leads on AUC (0.93) but accuracy drops sharply.
What the Random Forest classifier actually catches
Champion confusion matrix
Random Forest on the 49-row test set
Caught every one of the 9 minority Catch events (recall 1.00) at the cost of 9 false positives. For a field tool that triggers pest checks, false negatives cost much more than false positives, so the precision / recall trade is the right side.
Test metrics
49-row chronological test set
- F1
- 0.667
- AUC
- 0.919
- Accuracy
- 0.82
- Recall (Catch)
- 1.00
Minority class support: 9 samples. Threshold tuned to 0.10 on a held-out chronological validation slice.
Try the pest-prediction dashboard
The three-tab Streamlit dashboard ships on HuggingFace Spaces. Tab 1 is the exploratory data analysis, tab 2 surfaces the tournament results table for every model, tab 3 lets you pick a site and date and see the forecasted catch count alongside the actuals.
Frameworks and infrastructure
Source code on GitHub.