Iceberg table engine
Learn how to use the Iceberg table engine in RisingWave to store data natively in the Iceberg format.
In RisingWave, the Iceberg table engine allows you to create and manage tables directly within the system, while storing their underlying data in the Apache Iceberg format on external object storage. This offers an alternative way to persist data compared to RisingWave’s default row-based internal storage format (which also typically uses object storage).
Using the Iceberg table engine provides several benefits:
- Native management: RisingWave manages the table’s lifecycle (creation, schema, writes). You interact with it like any other RisingWave table (querying, inserting, using in materialized views).
- Iceberg format storage: Data is physically stored according to the Iceberg specification, using a configured Iceberg catalog and object storage path. This ensures compatibility with the Iceberg ecosystem.
- Simplified pipelines: You don’t need a separate CREATE SINK step to export data out of RisingWave into Iceberg format if Iceberg is your desired end format managed by RisingWave. Data ingested or computed can land directly in these Iceberg tables.
- Interoperability: Tables created with the Iceberg engine are standard Iceberg tables and can be read by external Iceberg-compatible query engines (like Spark, Trino, Flink, Dremio) using the same catalog and storage configuration.
This guide details how to set up and use the Iceberg table engine.
Setup and usage
1. Create an Iceberg connection
The Iceberg connection contains information about catalog and object storage. For syntax and properties, see CREATE CONNECTION
.
The following examples show how to create an Iceberg connection using different catalog types.
These examples use S3 for object storage. You can also use Google Cloud Storage (GCS) or Azure Blob Storage by replacing the S3 parameters with the appropriate parameters for your chosen storage backend. See the object storage configuration for details.
For the simplest setup, use RisingWave’s built-in catalog service:
For complete details on the hosted catalog, see Hosted Iceberg Catalog.
For details on catalog configuration parameters, see Catalog configuration.
2. Set connection as default (optional)
To simplify table creation, you can set a default connection for the Iceberg engine:
When a default connection is set, you can create Iceberg tables without specifying a connection.
3. Create an Iceberg table
Create a table using the Iceberg engine:
4. Work with your table
Once created, Iceberg tables work like any other RisingWave table:
Stream data into Iceberg tables
You can stream data directly from sources into Iceberg tables:
Time travel
Query historical snapshots of your Iceberg tables:
External access
Tables created with the Iceberg engine are standard Iceberg tables that can be accessed by external tools:
Spark:
Trino:
Use Amazon S3 Tables with the Iceberg table engine
Amazon S3 Tables provides an AWS-native Iceberg catalog service. When using S3 Tables as the catalog for Iceberg tables, you get automatic compaction benefits.
Create S3 Tables connection
Create Iceberg table with S3 Tables
For more details on S3 Tables configuration, see Object storage configuration.
Configuration options
Commit intervals
Control how frequently data is committed to the Iceberg table:
The approximate time to commit is calculated as:
Where barrier_interval_ms
and checkpoint_frequency
are system parameters that define the base checkpointing rate.
Limitations
Current limitations of the Iceberg table engine:
- No automatic compaction: Tables don’t automatically compact small files (though S3 Tables provides this)
- Limited DDL operations: Some schema changes may require recreating the table
- Single writer: Only RisingWave should write to tables created with this engine
Best practices
- Use hosted catalog for simple setups: Start with
hosted_catalog = true
for quick development. - Configure appropriate commit intervals: Balance between latency and file size.
- Consider S3 Tables for production: Automatic compaction and AWS-native management.
- Design proper partitioning: Plan your partition strategy for query performance.
- Monitor file sizes: Be aware of small file accumulation and plan compaction strategy.
Next steps
- Learn about hosted catalog: See Hosted Iceberg Catalog for the simplest setup.
- External catalog setup: Review Catalog configuration for production deployments.
- Storage configuration: Configure your object storage in Object storage configuration.