Hindcast Methodology — 2010–2024 Western Himalayan Replay
Catalogue Construction
The catalogue (historical_events.yaml) was assembled from four
independent records to minimise systematic bias:
- ICIMOD Hindu-Kush Himalaya event log — peer-reviewed reconnaissance reports for GLOFs and large mass movements.
- CWC India-WRIS gauge archive — peak-discharge records at Akhnoor, Salal, Baglihar, Dul Hasti.
- JKSDMA / HPSDMA winter situation reports — avalanche releases on NH-44 and NH-244.
- Peer-reviewed literature — Sati & Gahalaut (2014), Singh et al. (2021), Allen et al. (2023).
Each event records:
- ISO date, sub-basin, geographic location
- Hazard class (GLOF / AVALANCHE / RUNOFF)
- Magnitude with explicit uncertainty band
- Documented fatalities (used only for severity post-classification)
- Source citations
Exclusion rule: any event lacking at least two independent sources is omitted. Catalogue version is pinned in the report header for reproducibility.
Replay Protocol
For each event we reconstruct the forcing window
[T_event − 7 d, T_event + 1 d] from archival ERA5-Land, MODIS,
Sentinel-1/2 acquisitions cached in Cloudflare R2. The simulator runs
in hindcast mode with the following deviations from operational mode:
| Setting | Operational | Hindcast |
|---|---|---|
| Forcing latency | real-time NRT | archival cached COGs |
| Lead-time issue | T_now → T+72 h | T_event − 72 h → T_event + 24 h |
| Q-AE shots | 8192 | 32768 (variance reduction) |
| EvaluationAgent | rolling 30 d | full catalogue |
The deterministic fixture used in CI (runner._replay_event) seeds the
RNG from the event timestamp so test runs are bit-reproducible while
preserving realistic skill statistics.
Skill Metrics
| Metric | Formula | Pass threshold |
|---|---|---|
| ROC-AUC | rank-sum / (n_pos · n_neg) | ≥ 0.80 |
| Brier score | mean (p − o)² | ≤ 0.18 |
| Lead time | T_issue − T_event | ≥ 24 h mean |
| Severity match | predicted ≥ watch when documented ≥ watch | ≥ 0.85 |
| Magnitude bias | predicted_Q − observed_Q |
Null events are synthesised from quiet periods sampled at the same sub-basins to avoid the AUC singularity that would otherwise arise from an all-positive catalogue.
Reliability & Calibration
reliability_diagram is computed quarterly. If any populated bin
deviates by > 0.15 from the diagonal, the EvaluationAgent recalibrates
the QAE → severity mapping via isotonic regression and the new
calibration table is committed to data/calibration/qae_severity_*.json
with a CHANGELOG entry.
Provenance
Every hindcast report is hashed (SHA-256 over canonical JSON) and the
hash is anchored to Polygon PoS via AuditAgent.anchor_artifact. The
transaction hash is shown next to the AUC gauge on the Validation page.