1️⃣ What is CDC (Change Data Capture)?

CDC is a mechanism to capture and stream database changes (INSERT, UPDATE, DELETE) in real time, typically by reading transaction logs, to keep systems synchronized efficiently.

πŸ‘‰ Instead of repeatedly scanning the whole database, CDC only captures what changed, making systems more efficient.


2️⃣ Why do we use CDC?

πŸ“Œ Key Use Cases

  • Real-time data pipelines
    • Send DB changes to Kafka, analytics systems, etc.
  • Microservices sync
    • Keep multiple services/databases in sync
  • Data warehousing
    • Replicate production DB β†’ data warehouse
  • Audit logs
    • Track who changed what and when

3️⃣ How CDC Works (Conceptually)

When a change happens in DB:

User updates row β†’ DB logs change β†’ CDC captures change β†’ sends to consumers

πŸ” Common ways to implement CDC

1. Log-based CDC (⭐ Best)

  • Reads database transaction logs
  • Example tools:
    • Debezium
    • Apache Kafka (with connectors)
  • πŸ“Œ Pros:
    • No performance impact on DB
    • Captures all changes reliably

2. Trigger-based CDC

  • DB triggers fire on INSERT/UPDATE/DELETE
  • Store changes in another table
  • πŸ“Œ Pros:
    • Easy to implement
  • ⚠️ Cons:
    • Slows DB
    • Hard to scale

3. Timestamp / Polling-based CDC

  • Query rows where updated_at > last_checked_time
  • πŸ“Œ Pros:
    • Simple
  • ⚠️ Cons:
    • Misses deletes
    • Not truly real-time

4️⃣ Simple Example

Table:

users
id | name | updated_at

Change:

UPDATE users SET name = "Omi" WHERE id = 1;

CDC Output Event:

{
  "operation": "UPDATE",
  "before": { "id": 1, "name": "Om" },
  "after": { "id": 1, "name": "Omi" },
  "timestamp": "2026-04-14T04:00:00Z"
}

5️⃣ Where CDC fits in System Design

Architecture flow:

                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚      Application     β”‚
                β”‚ (User / Service API)β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚      Database        β”‚
                β”‚ (MySQL / Postgres)  β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
        (Writes: INSERT / UPDATE / DELETE)
                           β”‚
                           β–Ό
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚  Transaction Logs    β”‚
                β”‚ (Binlog / WAL)       β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                 (Log-based CDC reads)
                           β”‚
                           β–Ό
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚   CDC Tool           β”‚
                β”‚  (Debezium)          β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                (Convert to events)
                           β”‚
                           β–Ό
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚   Message Broker     β”‚
                β”‚     (Kafka)          β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό               β–Ό               β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Analytics  β”‚ β”‚   Cache      β”‚ β”‚ Search Index β”‚ β”‚ Notificationsβ”‚
β”‚  (Warehouse) β”‚ β”‚  (Redis)     β”‚ β”‚(Elastic)     β”‚ β”‚   Service    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6️⃣ Key Benefits

  • Real-time data sync
  • Efficient (no full table scans)
  • Event-driven architecture
  • Enables streaming systems