Skip to main content

RisingWave’s Iceberg journey

TimeEvent
Sep 2022Started developing the Iceberg sink
Nov 2022Iceberg sink (append-only, Java based)
Jan 2023Iceberg sink (mutable, Java based)
Mar 2023Iceberg sink (Java based)
Jun 2023Initiated Iceberg-rust
Sep 2023Iceberg sink (Rust based)
Apr 2024Iceberg source & Iceberg ad-hoc query
Jun 2024Support log store buffering and tunable freshness for Iceberg sink
Nov 2024Released Nimtable, the control plane for Apache Iceberg
Nov 2024Rust-based Iceberg compaction
Nov 2024Released Iceberg Table Engine
Apr 2025Support AWS S3 Tables, Snowflake Catalog, and Databricks Catalog tntegration
May 2025Support exactly-once for Iceberg writer
Sep 2025Support CoW mode for Iceberg Table Engine and Iceberg compaction

Feature comparison

Writing to Iceberg

The following table compares RisingWave with other ETL/ELT tools for Iceberg, such as Spark Streaming, Flink, and Fivetran.
RisingWaveSpark StreamingApache FlinkFivetran
InterfacePostgres-compatibleSpark SQL/Java/PythonFlink SQL/Java/Pythonno code
PositionETL/ELTETL/ELTETL/ELTELT only
Exactly-onceYes (decoupled with checkpoint)NoYes (no buffering, coupled with checkpoint)No
Schema evolution (ELT case, e.g., replicating Postgres to Iceberg)PartiallyNoPartially
(with Flink CDC integration)No
Data source supportRich (CDC, message queues, batch, webhook, etc)RichRichRichest
Iceberg compactionYes, using our own engine. Support both MoR (merge-on-read) and CoW (copy-on-write).Yes, in Databricks managed Iceberg tablesYes, in Confluent TableflowYes
Catalog supportRich (Unity Catalog, Polaris, Lakekeeper, S3 Tables, AWS Glue, Hive, JDBC, Snowflake Open Catalog, etc).Unity CatalogUnity Catalog, PolarisAWS Glue, Unity Catalog, etc.
MaturityMature (started supporting Iceberg since September 2022)MatureMatureLaunched managed Iceberg tables in June 2024.

Reading from Iceberg

RisingWaveApache SparkApache Flink
One-time loading (append-only)SupportedSupportedSupported
One-time loading (mutable, CoW)SupportedSupportedSupported
One-time loading (mutable, MoR)SupportedSupportedSupported
Periodic loading (append-only)Planned (v2.7)Supported (via scheduled batch query)Supported (via scheduled batch query)
Periodic loading (mutable, CoW)Planned (v2.7)Supported (via scheduled batch query)Supported (via scheduled batch query)
Periodic loading (mutable, MoR)Planned (v2.7)Supported (via scheduled batch query)Supported (via scheduled batch query)
Continuous loading (append-only)SupportedSupportedSupported
Continuous loading (mutable, CoW)SupportedPartially Supported (streams from new snapshots)Supported (native changelog stream)
Continuous loading (mutable, MoR)Not SupportedNot SupportedPartially Supported (streams changes after compaction)

Key Iceberg features in RisingWave

FeatureExplanationWhy ImportantPossible alternatives
ETL / Data Enrichment
Real-time ETLRisingWave provides powerful real-time ETL capability, enabling continuous transformation and enrichment of data streams before landing into Iceberg.Real-time ETL ensures low-latency insights and reduces the need for external ETL pipelines like Spark or Flink.Spark, Flink
ELT / Data Replication
CDC-based ELTBuilt on Debezium embedded engine, RisingWave continuously replicates changes from operational DBs into Iceberg.Simplifies data synchronization and guarantees consistency between operational databases and analytical Iceberg tables.Flink CDC, Kafka Connect
Writing to Iceberg
Built-in CompactionRisingWave’s internal compaction service merges small files and snapshots efficiently.Reduces query overhead, storage costs, and improves Iceberg query performance.Spark/EMR
Controllable Commit FrequencyUsers can configure commit frequency based on workload and freshness requirements.Balances performance, cost, and data freshness for diverse workloads.Kafka / Kafka Connect
Exactly-Once SemanticsGuarantees no duplicate or missing records while respecting primary key constraints from upstream sources.Maintains strong data consistency and correctness in streaming pipelines.Flink (no buffering, coupled with checkpoint)
Embedded Log StoreUses an internal log store (similar to an embedded Kafka) for buffering data before committing.Enables durability, backpressure handling, and fault-tolerant data delivery.Kafka / Kafka Connect
Mutable Stream ModesSupports two modes: MoR and CoW. MoR writes data and delete files for high freshness; CoW writes compacted data files for compatibility.Offers flexibility to balance freshness and compatibility with different query engines.-
Reading from Iceberg
One-time Loading (Append-only)Loads append-only Iceberg tables once for initial data import or snapshot analysis.Ideal for bootstrapping analytical pipelines or ad-hoc analysis without maintaining state.Spark, Flink
One-time Loading (Mutable, CoW)Loads CoW-style Iceberg tables once, applying overwrite semantics to ensure consistent snapshot reads.Enables consistent point-in-time analysis for mutable tables.Spark, Flink
One-time Loading (Mutable, MoR)Loads MoR-style Iceberg tables once, reading both data and delete files.Allows accurate reconstruction of the latest table state with equality delete handling.Spark, Flink
Periodic Loading (Append-only)Periodically imports new appended data from Iceberg snapshots.Suitable for near-real-time dashboards that don’t require full streaming.Spark
Periodic Loading (Mutable, CoW)Periodically reads compacted CoW Iceberg tables at scheduled intervals.Simplifies incremental updates for mutable datasets while reducing resource usage.Spark
Periodic Loading (Mutable, MoR)Periodically synchronizes MoR tables with delete support for better accuracy.Ensures correctness in workloads where updates or deletes occur.-
Continuous Loading (Append-only)Continuously monitors append-only Iceberg tables and ingests new data as it arrives.Enables true streaming from batch data sources with low latency.Spark, Flink
Continuous Loading (Mutable, CoW)Continuously loads compacted CoW tables to maintain up-to-date materialized views.Combines near-real-time freshness with compatibility for engines lacking delete-file support.-
Transparent Caching Layer (WIP)A planned feature exposing an S3-compatible endpoint for caching and query acceleration.Improves read efficiency while maintaining open, interoperable access.
Compaction & maintenance
Automatic OptimizationCompacts small files in the background using smart strategies to maintain healthy table structure.Improves query speed, reduces metadata overhead, and optimizes storage usage.Databricks managed Iceberg tables, S3 Tables, etc.
Universal CompatibilityWorks across all major cloud object stores (AWS, Azure, GCP), even for engines lacking delete support.Makes Iceberg maintenance cloud-agnostic and engine-independent.Databricks managed Iceberg tables, S3 Tables, etc.
Iceberg-rust EnhancementsRisingWave maintains a Rust fork of Iceberg-rust with support for V2 spec, partition writer, and fast append.Extends Iceberg’s native capabilities for better streaming integration.-
Observability & Custom PoliciesIncludes status tracking and fine-grained compaction policies.Allows users to customize optimization behavior per workload, improving control and transparency.-
Iceberg Transformation
Cascading Incremental MVsSupports cascading materialized views on top of Iceberg tables, maintained incrementally.Enables Medallion-style architecture (Bronze–Silver–Gold) and reduces compute cost for derived data.Spark Streaming, Flink
Postgres-style Transformation OperatorsIncludes dozens of PostgreSQL-compatible streaming operators such as joins, windows, and aggregations.Provides a rich transformation layer without relying on external frameworks.Spark Streaming, Flink