Learn about different ways to use Iceberg with RisingWave
Apache Iceberg is an open table format for huge analytic tables stored typically on object storage. RisingWave offers comprehensive support for working with Iceberg tables, enabling you to leverage Iceberg’s capabilities within your streaming pipelines.RisingWave provides two distinct approaches for working with Iceberg, each designed for different use cases and architectural patterns.
Connect to existing Iceberg tables managed by external systemsChoose this when you have existing Iceberg tables (created by Spark, Flink, batch jobs, etc.) and want RisingWave to read from or write to them as part of a larger data ecosystem.Key benefits:
Integration: Work with existing data lakes and multi-system architectures
Flexibility: Support for various external catalog systems (AWS Glue, JDBC, REST, etc.)
Hybrid processing: Combine batch and stream processing on the same tables
Data lake patterns: Implement lambda/kappa architectures with existing infrastructure
Use cases:
Existing Iceberg data lakes that need streaming capabilities.
Multi-engine environments where different systems share Iceberg tables.
Integration into existing ETL/ELT pipelines.
Adding real-time processing to batch-oriented data workflows.
RisingWave creates and manages Iceberg tables nativelyChoose this when you want RisingWave to be the primary owner of your Iceberg tables. RisingWave handles table creation, schema management, and the complete lifecycle while storing data in the standard Iceberg format.Key benefits:
Simplified architecture: No external catalog setup required with hosted catalog option
Streaming-first: Direct path from streaming sources to Iceberg format
Native management: Tables work like any other RisingWave table for queries and operations
Ecosystem compatibility: Standard Iceberg tables readable by Spark, Trino, Flink, etc
Use cases:
New streaming applications where RisingWave is the primary data platform.
Quick start with Iceberg without external infrastructure.
Streaming data directly into analytical storage format.
It’s important to understand that RisingWave’s own internal storage system (Hummock) also uses object storage (like S3) to persist data, but it uses a row-based format optimized for RisingWave’s internal operations.When working with Iceberg, you are storing or accessing data in the columnar Iceberg format on object storage, which is designed for analytical workloads and ecosystem interoperability.