How RisingWave implements CDC
RisingWave reads database change streams differently depending on the source database:- PostgreSQL: RisingWave connects directly to PostgreSQL’s logical replication slot and reads the write-ahead log (WAL) using the native replication protocol. No Debezium or Kafka is involved.
- MySQL and SQL Server: RisingWave embeds the Debezium Embedded Engine in-process. The engine reads binlog (MySQL) or CDC (SQL Server) directly, without requiring a Kafka cluster or Kafka Connect worker.
- MongoDB: RisingWave reads MongoDB change streams via the native MongoDB change stream API.
CDC sources and tables
CDC connectors in RisingWave require a Table, not just a Source. Because a CDC stream contains inserts, updates, and deletes, RisingWave must materialize the current state of the data to correctly apply each operation. A Table provides this state store, along with a primary key and offset tracking for fault-tolerant recovery. You can create CDC tables in two ways: Direct table with embedded connector:Shared sources and transactional consistency
When replicating multiple tables from the same database, use a shared source. A shared source establishes a single replication connection and fans out change events to all derived tables, ensuring that cross-table transactions are applied atomically. If each table created its own separate replication connection, a transaction updating bothusers and orders could arrive at the two tables at different times — breaking transactional consistency. With a shared source, all tables consume the same ordered change stream, so multi-table transactions are always applied together.
Shared sources are supported for PostgreSQL CDC, MySQL CDC, and SQL Server CDC.
Connector reference
Kafka-based CDC ingestion
You can also ingest CDC events through an event streaming system such as Apache Kafka. In this approach, a separate CDC tool (such as standalone Debezium with Kafka Connect) captures changes from the source database and publishes them as Kafka topics. RisingWave then consumes those topics as a Kafka source, with the message format set to match the CDC tool’s output (for example,FORMAT DEBEZIUM ENCODE JSON).
This architecture is useful when Kafka is already your central event bus and other downstream systems also need to consume the same change stream. For architectures where RisingWave is the primary consumer, the native CDC connectors are simpler to operate.