Summary
| Standalone Debezium + Kafka | RisingWave | |
|---|---|---|
| CDC engine | Debezium Kafka Connect connectors | Debezium Embedded Engine (in-process, for MySQL/SQL Server); direct WAL replication (PostgreSQL) |
| Kafka required | Yes — Kafka brokers, Kafka Connect cluster, schema registry | No — CDC runs in-process inside RisingWave |
| Configuration | Connector JSON, Kafka Connect topology | Single CREATE SOURCE SQL statement |
| Transformation | Requires separate engine (Flink, ksqlDB, Spark) | Built-in SQL-based continuous transformations |
| Query serving | None | Materialized views served at 10–20 ms p99 via PostgreSQL wire protocol |
| Fan-out to many consumers | Yes — any Kafka consumer can read the topic | Via sinks to Kafka or other destinations |
| Operational complexity | Kafka brokers + Kafka Connect workers + processing engine + serving DB | Single deployment |
How RisingWave uses Debezium internally
RisingWave does not reinvent CDC from scratch. For MySQL and SQL Server, RisingWave’s CDC connectors are powered by the Debezium Embedded Engine — the same battle-tested library that underlies standalone Debezium, running in-process inside RisingWave without any Kafka infrastructure. For PostgreSQL, RisingWave reads the write-ahead log directly via logical replication, which provides the same semantics with native PostgreSQL protocol support. This architecture means RisingWave inherits Debezium’s maturity and database compatibility while eliminating the operational overhead of deploying and managing a separate Kafka cluster and Kafka Connect workers.Enhancements RisingWave makes to the embedded engine
RisingWave has made six major improvements to the embedded Debezium engine beyond what standalone Debezium provides:- Lock-free snapshots with parallel stitching — Records snapshot boundaries (LSN/binlog positions) while simultaneously streaming snapshot data and WAL changes, eliminating production database locks during the initial backfill.
- Parallel backfill — Uses primary key range slicing with dynamic thread adjustment based on source database load, significantly accelerating the initial snapshot of large tables.
- Multi-cloud schema history — Pluggable storage for schema history supporting S3, GCS, Azure Blob, and Alibaba OSS — replacing Debezium’s Kafka-based schema history with cloud-native alternatives.
- Memory-efficient schema history — Segments history into time or size-based chunks and loads only the portion needed for current processing, reducing memory footprint for long-running deployments.
- TOAST column completion — Automatically replaces PostgreSQL TOAST placeholders with actual historical values before delivering changes downstream, ensuring complete records.
- Direct schema change detection — Leverages PostgreSQL’s replication protocol to detect DDL changes without polling, enabling immediate schema evolution handling.