Skip to main content

Phase 8 — V&V, Security & Production Deploy

Duration: ~7 working days Critical-path predecessor: Phases 2, 3, 5 Status: in progress · final phase

Deliverables (§9 / §11 / §15)

  1. Hindcast harness — replay 7+ documented Chenab events 2010–2024
  2. STRIDE threat model + ZeroTrustShield middleware
  3. Static-analysis CI gate — Bandit, Safety, Trivy, ZAP baseline, gitleaks
  4. OpenTelemetry + Prometheus + Sentry observability stack
  5. Grafana SLO dashboards + Prometheus alert rules
  6. Runbooks — GLOF onset · avalanche aftermath · total cloud occlusion
  7. Production Netlify deploy with custom domain snow-ir.app
  8. GitHub Pages docs deploy with full handbook + ADR archive
  9. SystemHealthBadge — §15 Definition-of-Done stamp on the console
  10. release.yml workflow — semantic-version tagged v1.0

Acceptance Gate · §15 Definition of Done

#CriterionVerifierEvidence artefact
1Hindcast AUC ≥ 0.80 over 2010–2024 cataloguesnow_ir.vv.hindcast.runnerdata/vv/hindcast_report.json
2Mean Brier ≤ 0.18skill_metrics.brier_scorerelease-notes summary
3Mean lead time ≥ 24 h on documented eventshindcast scoringhindcast_report.json
4All §9.3 gates green (R² ≥ 0.85, NSE ≥ 0.6, KGE ≥ 0.55)nightly validation-summary/validation/summary/rolling30d
5Lighthouse Performance ≥ 0.90, A11y ≥ 0.95lhci autorunLHCI artefact in release.yml
6Backend coverage ≥ 75 %pytest-covCI log
7Zero high/critical findings · Bandit, Safety, Trivy, ZAPsecurity-extended.ymlweekly SARIF upload
8STRIDE threat model revieweddocs/security/threat-model.mdquarterly review header
9Three runbooks committeddocs/runbooks/*.mdrepository tree
10Polygon audit anchors verifiable for ≥ 100 alertsAuditAgent + exploreraudit_events.tx_hash
11OpenTelemetry traces visible in Grafana Tempoobservability.installdashboard screenshot
12Prometheus alerts wired to PagerDuty / OpsGenieinfra/grafana/alerts/*.yamlalertmanager route
13Netlify production site at snow-ir.app with HSTS preloadnetlify.tomlsecurityheaders.com A+
14GitHub Pages docs at snow-ir.app/docsdeploy-docs jobactions deployment URL
15SystemHealthBadge shows ≥ 30 days unbrokenconsole UIscreenshot in release notes
16v1.0.0 tag pushed; release notes attachedrelease.ymlGitHub release

Phase 8 Day-by-Day Plan

DayFocusOwner
D1Repo scaffold + historical event catalogue + hindcast runnerV&V lead
D2Skill metrics, hindcast tests, EvaluationAgent rolling windowV&V lead
D3STRIDE threat model + ZeroTrustShield + secret-scan testsSecurity lead
D4OpenTelemetry + Prometheus + Sentry wiring; Grafana dashboardsSRE
D5Runbooks (GLOF, avalanche, cloud occlusion); tabletop dry-runOps lead
D6release.yml + security-extended.yml; Netlify production cutoverDevOps
D7LHCI gate; system-health badge; v1.0 tag + retrospectiveTech lead

ADR Anchors

  • ADR-014 — Why hindcast AUC, not POD/FAR, as the headline skill metric
  • ADR-015 — Trade-off: ZeroTrustShield in-process vs API-gateway-only
  • ADR-016 — Choice of Polygon PoS over Ethereum mainnet for audit anchors
  • ADR-017 — Lighthouse 0.90 perf budget vs MapLibre WebGL workload