Initializing pipeline

Available for new roles

Junior Data Engineer

NAM DUONG HUU

I build pipelines that assume failure._

Idempotency · late-arriving data · deduplication · self-healing recovery

View work Download CV

120+

Pipelines shipped

8PB+

Data processed

99.9%

Avg. uptime

Scroll

01 / Stack

Tech Stack

The tools I use to move data reliably, at scale.

Languages

Python SQL Scala Bash

Ingestion & Streaming

Apache Kafka Flink Kinesis Debezium

Processing & Transform

Apache Spark dbt Databricks Pandas

Orchestration

Apache Airflow Dagster Prefect

Storage & Warehouse

Snowflake BigQuery Delta Lake S3

Cloud & Infra

AWS Terraform Docker Kubernetes

02 / Architecture

Anatomy of a Pipeline

How raw events become decision-ready data — watch it flow through every stage.

03 / Work

Selected Work

A few pipelines I'm proud of.

P-01 · streaming

Real-time Fraud Detection

Sub-second fraud scoring on 1.2M events/s using Kafka, Flink and a feature store feeding online ML models.

1.2M/s
throughput
<300ms
latency
↓40%
false pos.

Kafka Flink Snowflake

P-02 · migration

Lakehouse Migration

Migrated a 2PB legacy warehouse to a Delta Lakehouse on Databricks, cutting compute cost 38% and query times in half.

2PB
migrated
↓38%
cost
2×
faster

Spark Databricks Delta Lake

P-03 · platform

Self-serve Analytics

Built a dbt + Airflow + BigQuery platform with 300+ models and automated tests, giving 200 analysts trustworthy self-serve data.

300+
dbt models
200
analysts
99%
tests pass

dbt Airflow BigQuery

P-04 · cdc

CDC Streaming Sync

Change-data-capture pipeline syncing 80+ Postgres tables to Snowflake in near real time with Debezium and Kafka Connect.

80+
tables
~5s
lag
0
data loss

Debezium Kafka Connect Postgres

04 / Experience

Experience

Senior Data Engineer · FinScale

2023 – Present

Lead the real-time data platform powering fraud detection across 1.2M events/s.
Introduced observability and data contracts, cutting pipeline incidents by 70%.

Data Engineer · Tiki

2020 – 2023

Built a self-serve analytics platform with dbt and Airflow for 200+ analysts.
Led a 2PB lakehouse migration that reduced compute cost by 38%.

Data Engineer · VNG

2018 – 2020

Developed batch ETL pipelines on Spark for player-behavior data.
Automated data-quality checks, improving the reliability of reporting.

05 / Credentials

Certifications

AWS Certified Data Engineer

Amazon Web Services · 2024

Databricks Data Engineer Pro

Databricks · 2024

GCP Professional Data Engineer

Google Cloud · 2023

SnowPro Advanced: Data Engineer

Snowflake · 2023

Available for new roles

Get in touch

Let's build something that scales.

Open to senior data engineering and platform roles.

duongnamhaui@gmail.com GitHub LinkedIn