| ETL / Data Enrichment | | | |
| Real-time ETL | RisingWave provides powerful real-time ETL capability, enabling continuous transformation and enrichment of data streams before landing into Iceberg. | Real-time ETL ensures low-latency insights and reduces the need for external ETL pipelines like Spark or Flink. | Spark, Flink |
| ELT / Data Replication | | | |
| CDC-based ELT | Built on Debezium embedded engine, RisingWave continuously replicates changes from operational DBs into Iceberg. | Simplifies data synchronization and guarantees consistency between operational databases and analytical Iceberg tables. | Flink CDC, Kafka Connect |
| Writing to Iceberg | | | |
| Built-in Compaction | RisingWave’s internal compaction service merges small files and snapshots efficiently. | Reduces query overhead, storage costs, and improves Iceberg query performance. | Spark/EMR |
| Controllable Commit Frequency | Users can configure commit frequency based on workload and freshness requirements. | Balances performance, cost, and data freshness for diverse workloads. | Kafka / Kafka Connect |
| Exactly-Once Semantics | Guarantees no duplicate or missing records while respecting primary key constraints from upstream sources. | Maintains strong data consistency and correctness in streaming pipelines. | Flink (no buffering, coupled with checkpoint) |
| Embedded Log Store | Uses an internal log store (similar to an embedded Kafka) for buffering data before committing. | Enables durability, backpressure handling, and fault-tolerant data delivery. | Kafka / Kafka Connect |
| Mutable Stream Modes | Supports two modes: MoR and CoW. MoR writes data and delete files for high freshness; CoW writes compacted data files for compatibility. | Offers flexibility to balance freshness and compatibility with different query engines. | - |
| Reading from Iceberg | | | |
| One-time Loading (Append-only) | Loads append-only Iceberg tables once for initial data import or snapshot analysis. | Ideal for bootstrapping analytical pipelines or ad-hoc analysis without maintaining state. | Spark, Flink |
| One-time Loading (Mutable, CoW) | Loads CoW-style Iceberg tables once, applying overwrite semantics to ensure consistent snapshot reads. | Enables consistent point-in-time analysis for mutable tables. | Spark, Flink |
| One-time Loading (Mutable, MoR) | Loads MoR-style Iceberg tables once, reading both data and delete files. | Allows accurate reconstruction of the latest table state with equality delete handling. | Spark, Flink |
| Periodic Loading (Append-only) | Periodically imports new appended data from Iceberg snapshots. | Suitable for near-real-time dashboards that don’t require full streaming. | Spark |
| Periodic Loading (Mutable, CoW) | Periodically reads compacted CoW Iceberg tables at scheduled intervals. | Simplifies incremental updates for mutable datasets while reducing resource usage. | Spark |
| Periodic Loading (Mutable, MoR) | Periodically synchronizes MoR tables with delete support for better accuracy. | Ensures correctness in workloads where updates or deletes occur. | - |
| Continuous Loading (Append-only) | Continuously monitors append-only Iceberg tables and ingests new data as it arrives. | Enables true streaming from batch data sources with low latency. | Spark, Flink |
| Continuous Loading (Mutable, CoW) | Continuously loads compacted CoW tables to maintain up-to-date materialized views. | Combines near-real-time freshness with compatibility for engines lacking delete-file support. | - |
| Transparent Caching Layer (WIP) | A planned feature exposing an S3-compatible endpoint for caching and query acceleration. | Improves read efficiency while maintaining open, interoperable access. | |
| Compaction & maintenance | | | |
| Automatic Optimization | Compacts small files in the background using smart strategies to maintain healthy table structure. | Improves query speed, reduces metadata overhead, and optimizes storage usage. | Databricks managed Iceberg tables, S3 Tables, etc. |
| Universal Compatibility | Works across all major cloud object stores (AWS, Azure, GCP), even for engines lacking delete support. | Makes Iceberg maintenance cloud-agnostic and engine-independent. | Databricks managed Iceberg tables, S3 Tables, etc. |
| Iceberg-rust Enhancements | RisingWave maintains a Rust fork of Iceberg-rust with support for V2 spec, partition writer, and fast append. | Extends Iceberg’s native capabilities for better streaming integration. | - |
| Observability & Custom Policies | Includes status tracking and fine-grained compaction policies. | Allows users to customize optimization behavior per workload, improving control and transparency. | - |
| Iceberg Transformation | | | |
| Cascading Incremental MVs | Supports cascading materialized views on top of Iceberg tables, maintained incrementally. | Enables Medallion-style architecture (Bronze–Silver–Gold) and reduces compute cost for derived data. | Spark Streaming, Flink |
| Postgres-style Transformation Operators | Includes dozens of PostgreSQL-compatible streaming operators such as joins, windows, and aggregations. | Provides a rich transformation layer without relying on external frameworks. | Spark Streaming, Flink |