Educational Analytics: OULAD Click-Stream Forecasting
I trained a custom 1D CNN that edged out SARIMA, ARIMAX, and Prophet on the Open University click-stream forecast (MAE 0.199, MAPE 1.9%).
420 MB
Source data
808
Daily observations
4
Models tested
1.9%
Best test MAPE
Forecasting student VLE engagement on the Open University dataset
The Open University Learning Analytics dataset (OULAD) is a public 420 MB release of click-stream and assessment data across multiple UK course presentations. I aggregated it to 808 daily observations of total VLE interactions and benchmarked four forecasters head-to-head: SARIMA, ARIMAX, Prophet, and a custom 1D CNN.
The goal: a forecast accurate enough to inform academic planning, but with at least one model that staff without an ML background could interpret. SARIMA stayed in the comparison for that reason.
Two different lag windows for two different model families
Statistical models (SARIMA, ARIMAX, Prophet) and the CNN want different feature shapes, so I built two parallel windows:
- 7 lag features at 1, 2, 3, 7, 14, 21, and 28 days for the statistical models, mirroring weekly and monthly seasonality.
- 30-lag input window for the 1D CNN, letting the network learn its own short-horizon shape without having to write seasonality terms by hand.
CNN edged out Prophet by a thin margin
| Model | Test MAE | Test MAPE | Notes |
|---|---|---|---|
| 1D CNN | 0.199 | 1.9% | Winner. 50 epochs, batch 32, 30-lag window. |
| Prophet | 0.203 | 2.0% | Razor-thin runner-up. |
| ARIMAX | n/a | n/a | Solid baseline; see chart below. |
| SARIMA | n/a | n/a | Kept as interpretable seasonal baseline. |
What each model sees and predicts
Three notebook outputs: the raw time series with anomalies flagged, Prophet's forecast against observed clicks in the original scale, and the winning CNN's training-data fit plus its out-of-sample predictions.



Frameworks and infrastructure
Source code on GitHub.