calvin.goh
/ projects / etl-monitoring
← All projects
Personal · production-gradePython · Pydantic · managed Postgres7 pipelines · daily on hosted CI

A monitoring layer so I stop finding out about broken pipelines from the BI user.

Daily ETL pulling Malaysian government CPI + MCOICOP classifications from the Malaysian open-data API — async extract with retries, Pydantic-validated transform, upserted to managed Postgres. Orchestrated by a scheduled CI workflow on a hosted runner that auto-publishes regenerated dashboard JSONs. 5 SLOs are continuously monitored; breaches page Slack, email, and Discord. Runs in 2.3s average per pipeline.

24h success rate
100%
vs ≥ 95% SLO
SLO health score
80
4 / 5 meeting
Avg latency
2.3s
vs ≤ 5m SLO
// LIVE · last 7 days

Pipeline health grid.

OKWarnFail
// INSPECTOR

CPI Core

ok
LATENCY · LAST 3 RUNS
seconds, p50
SLA: 300sp50: 2sruns (window): 3fails: 0records: 639
LAST RUN · JUST NOW
SUCCESS · 455 records
duration 2.8s · within SLA
downstream refresh queued
RUN HISTORY · BY DAY
// SLO SCORECARD

5 SLOs, last 7 days.

Defined in the alerting config module. When any SLO trips its target, the alert fans out to Slack, email, and Discord — same alert, three pipes for redundancy.

Success rate≥ 95%
100.0%
Error rate≤ 5%
0.0%
Avg duration≤ 300s
4.7s
Data quality≥ 98%
89.1%
Data freshness≤ 24h
15s
// PER-PIPELINE

Adherence by pipeline.

Each pipeline tracked against the same ≥ 95% success-rate target. Lower than that triggers a deeper investigation before the next scheduled run.

CPI Core≥ 95%
100%
CPI Headline≥ 95%
100%
CPI State≥ 95%
100%
Data Validation≥ 95%
100%
MCOICOP Class≥ 95%
100%
MCOICOP Division≥ 95%
100%
MCOICOP Group≥ 95%
100%
// THE STACK

Cheap, boring, monitored.

Hosted CI scheduler
Daily cron · manual dispatch · run logs
Python · Pandas
Async transform · Pydantic validation
Managed Postgres
Upsert · etl_runs metastore · 600+ runs
Slack + Email + Discord
Alerts on SLO breach · on-call is just me