Skip to main content
Change data capture (CDC) lets you replicate real-time data changes from operational databases into RisingWave for stream processing and analytics. RisingWave provides native CDC connectors for PostgreSQL, MySQL, SQL Server, and MongoDB — no Kafka, no Debezium deployment, and no additional middleware required.

How RisingWave implements CDC

RisingWave reads database change streams differently depending on the source database:
  • PostgreSQL: RisingWave connects directly to PostgreSQL’s logical replication slot and reads the write-ahead log (WAL) using the native replication protocol. No Debezium or Kafka is involved.
  • MySQL and SQL Server: RisingWave embeds the Debezium Embedded Engine in-process. The engine reads binlog (MySQL) or CDC (SQL Server) directly, without requiring a Kafka cluster or Kafka Connect worker.
  • MongoDB: RisingWave reads MongoDB change streams via the native MongoDB change stream API.
In all cases, changes flow directly from the source database into RisingWave’s streaming engine with transactional consistency and automatic checkpoint recovery.

CDC sources and tables

CDC connectors in RisingWave require a Table, not just a Source. Because a CDC stream contains inserts, updates, and deletes, RisingWave must materialize the current state of the data to correctly apply each operation. A Table provides this state store, along with a primary key and offset tracking for fault-tolerant recovery. You can create CDC tables in two ways: Direct table with embedded connector:
CREATE TABLE users (
  id INT PRIMARY KEY,
  name VARCHAR,
  email VARCHAR
) WITH (
  connector = 'postgres-cdc',
  hostname = '127.0.0.1',
  port = '5432',
  username = 'admin',
  password = 'secret',
  database.name = 'mydb',
  schema.name = 'public',
  table.name = 'users'
);
Shared source with multiple derived tables (recommended for multi-table replication):
CREATE SOURCE pg_src WITH (
  connector = 'postgres-cdc',
  hostname = '127.0.0.1',
  port = '5432',
  username = 'admin',
  password = 'secret',
  database.name = 'mydb',
  slot.name = 'rw_slot'
);

CREATE TABLE users (...) FROM pg_src TABLE 'public.users';
CREATE TABLE orders (...) FROM pg_src TABLE 'public.orders';

Shared sources and transactional consistency

When replicating multiple tables from the same database, use a shared source. A shared source establishes a single replication connection and fans out change events to all derived tables, ensuring that cross-table transactions are applied atomically. If each table created its own separate replication connection, a transaction updating both users and orders could arrive at the two tables at different times — breaking transactional consistency. With a shared source, all tables consume the same ordered change stream, so multi-table transactions are always applied together. Shared sources are supported for PostgreSQL CDC, MySQL CDC, and SQL Server CDC.

Connector reference

Kafka-based CDC ingestion

You can also ingest CDC events through an event streaming system such as Apache Kafka. In this approach, a separate CDC tool (such as standalone Debezium with Kafka Connect) captures changes from the source database and publishes them as Kafka topics. RisingWave then consumes those topics as a Kafka source, with the message format set to match the CDC tool’s output (for example, FORMAT DEBEZIUM ENCODE JSON). This architecture is useful when Kafka is already your central event bus and other downstream systems also need to consume the same change stream. For architectures where RisingWave is the primary consumer, the native CDC connectors are simpler to operate.