💧

HydroGEM Evaluation on ECCC Sites

Anomaly detection performance on 106 Canadian hydrometric stations

Evaluation Setup

Goal: Evaluate HydroGEM's ability to detect anomalies by comparing its detections against hydrologist corrections.

Ground Truth Labels

We create anomaly labels from the difference between raw and corrected data. If the correction exceeds 1% of the site's flow range, we label that timestep as anomalous.

Raw Value
125.3
Corrected Value
98.7
Label
Anomaly

If |raw - corrected| > 1% of site range → labeled as anomaly

Site Selection Criteria

We selected sites where the corrected data follows expected physical relationships.

Stage-Discharge Correlation
Higher water level corresponds to higher flow (ρ > 0.5)
Rating Curve Fit
Stage and discharge follow a predictable relationship (R² ≥ 0.3)
Data Completeness
Windows with sufficient valid data (≥ 70%)
Meaningful Corrections
10-40% of data was corrected (enough anomalies to evaluate)

Results Summary

106
Sites Evaluated
0.574
Mean F1 Score
0.642
Mean Precision
0.560
Mean Recall
80
Sites F1 ≥ 0.5

F1 Score Distribution

F1 Score by Site

Precision vs Recall

Performance Summary

Metric Value
Number of Sites106
Mean F10.574 ± 0.102
Median F10.575
Mean Precision0.642
Mean Recall0.560
Sites with F1 ≥ 0.4101
Sites with F1 ≥ 0.580
Sites with F1 ≥ 0.646
Sites with F1 ≥ 0.712

Comparison with Baselines

MethodTypeMean F1
HydroGEMFoundation Model0.574
PersistenceBaseline0.326
Isolation ForestML0.245
IQR (Tukey)Statistical0.101
LOFML0.097

Detection Examples

Reading the plots: Blue shaded regions show where corrections were made (ground truth). Bottom panel compares HydroGEM detections against ground truth. Click any card to view the full plot.

Raw
Corrected
HydroGEM
Ground Truth
Detection
🏆 Best F1 > 0.55
05AJ001
F1: 0.932
P: 0.88 | R: 0.99
02YG001
F1: 0.932
P: 0.90 | R: 0.96
08EC013
F1: 0.848
P: 0.76 | R: 0.97
01DH002
F1: 0.824
P: 0.88 | R: 0.77
05AD007
F1: 0.816
P: 0.73 | R: 0.93
05BJ004
F1: 0.808
P: 0.96 | R: 0.70
05DF001
F1: 0.807
P: 0.69 | R: 0.98
02KF006
F1: 0.782
P: 0.67 | R: 0.94
05FE004
F1: 0.780
P: 0.84 | R: 0.73
05AG006
F1: 0.768
P: 0.79 | R: 0.75
✓ Good F1 0.40 - 0.55
05BJ004
F1: 0.545
P: 0.45 | R: 0.70
05EA005
F1: 0.545
P: 0.47 | R: 0.64
05CC001
F1: 0.545
P: 0.46 | R: 0.67
05BA002
F1: 0.540
P: 0.37 | R: 1.00
05CA009
F1: 0.540
P: 0.39 | R: 0.87
07AA001
F1: 0.538
P: 0.42 | R: 0.74
05DD009
F1: 0.536
P: 0.41 | R: 0.78
05AA008
F1: 0.536
P: 0.38 | R: 0.91
02ZL004
F1: 0.532
P: 0.42 | R: 0.73
01BJ007
F1: 0.529
P: 0.47 | R: 0.60
◐ Moderate F1 0.25 - 0.40
01BP002
F1: 0.400
P: 0.33 | R: 0.50
05AD010
F1: 0.399
P: 0.36 | R: 0.45
01AF007
F1: 0.399
P: 0.39 | R: 0.41
05DF001
F1: 0.398
P: 0.30 | R: 0.57
01AK006
F1: 0.398
P: 0.36 | R: 0.44
01BJ007
F1: 0.398
P: 0.48 | R: 0.34
07BE001
F1: 0.391
P: 0.29 | R: 0.60
02YL001
F1: 0.391
P: 0.42 | R: 0.37
01BJ003
F1: 0.390
P: 0.44 | R: 0.35
07BK007
F1: 0.389
P: 0.39 | R: 0.39