Writing

Notes from the pipeline.

Featured

Why your pipeline should assume failure

Idempotency, checkpointing and self-healing recovery aren't luxuries — they're the price of sleeping through the night.

Jun 12, 2026 · 5 min Read

Exactly-once is a lie (and what to do instead)

End-to-end exactly-once is really at-least-once plus idempotency. Getting that right changes how you design every sink.

Jun 04, 2026 · 5 min

Taming late-arriving data with watermarks

Late data is normal, not an edge case. Watermarks and windowing handle stragglers without reprocessing the world.

May 28, 2026 · 5 min

Lakehouse migration: 2PB without downtime

The dual-write, reconcile, and cutover strategy that moved a legacy warehouse to Delta without anyone noticing.

May 15, 2026 · 5 min

dbt tests that actually catch bugs

not_null and unique are table stakes. These are the semantic tests that have saved me from silent data incidents.

May 02, 2026 · 6 min

Backfills won't hurt you if you do this

Safe backfills need idempotency, partitioning, and a throttle. The pattern that lets me reprocess a year of data with confidence.

Apr 21, 2026 · 5 min

Data contracts before dashboards

Monitoring business metrics is too late. Catch breakage at the schema boundary with contracts — before it flows downstream.

Apr 09, 2026 · 5 min