Skip to main content

What is change data capture?

Change data capture (CDC) is a technique for tracking row-level changes (inserts, updates, deletes) in a database and delivering those changes as a real-time event stream to downstream systems. CDC eliminates the need for batch ETL — instead of periodically scanning entire tables for changes, CDC captures each change as it happens and delivers it immediately.

How CDC works in RisingWave

RisingWave provides native CDC connectors that connect directly to source databases without requiring Kafka, Debezium, or any other middleware. This simplifies your architecture and reduces operational overhead.
-- Create a CDC table from PostgreSQL
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  amount DECIMAL,
  status VARCHAR,
  created_at TIMESTAMP
) WITH (
  connector = 'postgres-cdc',
  hostname = 'db.example.com',
  port = '5432',
  username = 'repl_user',
  password = '<your_password>',
  database.name = 'production',
  schema.name = 'public',
  table.name = 'orders'
);
Once created, the CDC table:
  1. Performs an initial snapshot of the existing data in the source table.
  2. Streams ongoing changes in real time using the database’s replication protocol (WAL for PostgreSQL, binlog for MySQL).
  3. Maintains transactional consistency — changes are applied in the same order as the source database.
  4. Recovers automatically from failures using checkpoint-based restoration.

Supported CDC sources

DatabaseConnectorReplication method
PostgreSQLpostgres-cdcLogical replication (WAL)
MySQLmysql-cdcBinlog replication
SQL Serversqlserver-cdcSQL Server CDC tables
MongoDBmongodb-cdcChange streams
All native CDC connectors connect directly to the source database — no Kafka cluster or Debezium deployment is required.

Shared source for multi-table CDC

When you need to replicate multiple tables from the same database, RisingWave supports shared sources to avoid creating a separate replication connection for each table:
-- Create a shared PostgreSQL CDC source
CREATE SOURCE pg_source WITH (
  connector = 'postgres-cdc',
  hostname = 'db.example.com',
  port = '5432',
  username = 'repl_user',
  password = '<your_password>',
  database.name = 'production'
);

-- Create individual tables from the shared source
CREATE TABLE orders (*) FROM pg_source TABLE 'public.orders';
CREATE TABLE customers (*) FROM pg_source TABLE 'public.customers';
CREATE TABLE products (*) FROM pg_source TABLE 'public.products';
Shared sources use a single replication slot on the source database, reducing the load on the upstream system.

CDC vs. Kafka-based ingestion

You can also ingest CDC events through Kafka using Debezium or other CDC tools. RisingWave supports Debezium JSON and Maxwell JSON formats from Kafka topics. However, native CDC has significant advantages:
Native CDCKafka + Debezium
InfrastructureRisingWave + source database onlyKafka + Debezium + source database
Operational complexityLow — no middleware to manageHigh — manage Kafka, Connect, Debezium
LatencyLower — direct connectionHigher — additional hop through Kafka
Shared source supportYesNo (requires separate Kafka topics per table)
Multi-table ingestionSingle replication slotSeparate connector per table
Use native CDC when you want the simplest architecture with the lowest latency. Use Kafka-based CDC when you already have a Kafka infrastructure and need to fan out CDC events to multiple consumers.

Common CDC use cases

  • Real-time analytics: Replicate operational database changes into RisingWave for real-time dashboards and monitoring.
  • Streaming ETL: Transform and enrich CDC data with materialized views, then sink results to a data warehouse or data lake.
  • Cache invalidation: Track database changes and update caches or search indexes in real time.
  • Event-driven architectures: Convert database changes into events for downstream microservices.
  • Data synchronization: Keep multiple systems in sync by replicating changes across databases.