Learn how to use the Iceberg table engine in RisingWave to store data natively in the Iceberg format.
In RisingWave, the Iceberg table engine allows you to create and manage tables directly within the system, while storing their underlying data in the Apache Iceberg format on external object storage. This offers an alternative way to persist data compared to RisingWave’s default row-based internal storage format (which also typically uses object storage).
Using the Iceberg table engine provides several benefits:
This guide details how to set up and use the Iceberg table engine.
The Iceberg connection contains information about catalog and object storage. For syntax and properties, see CREATE CONNECTION
.
The following examples show how to create an Iceberg connection using different catalog types.
These examples use S3 for object storage. You can also use Google Cloud Storage (GCS) or Azure Blob Storage by replacing the S3 parameters with the appropriate parameters for your chosen storage backend. See the object storage configuration for details.
You need to configure the iceberg table engine to use the connection that was just created. All tables created with the Iceberg table engine will use the connection by default.
Now, you can create a table using the standard CREATE TABLE
syntax, but adding the ENGINE = iceberg
clause.
The commit_checkpoint_interval
parameter controls how frequently (every N checkpoints) RisingWave commits changes to the Iceberg table, creating a new Iceberg snapshot. The default value is 60. Typically, the checkpoint time is 1s. That means RisingWave will commit changes to the Iceberg table every 60s.
The approximate time to commit to Iceberg can be calculated as time = barrier_interval_ms × checkpoint_frequency × commit_checkpoint_interval
. barrier_interval_ms
and checkpoint_frequency
are system parameters that define the base checkpointing rate; commit_checkpoint_interval
is configurable in the Iceberg table engine.
For tables created with the Iceberg table engine, you can insert data using standard INSERT
statements and query using SELECT
. You can use them as base tables to create materialized views or join them with regular tables. However, RisingWave doesn’t support renaming or changing the schemas of natively managed Iceberg tables.
Besides creating a table with connector, you can also sink data from another RisingWave source, table, or materialized view into a table created with the Iceberg engine.
RisingWave’s Iceberg table engine supports using Amazon S3 Tables as an Iceberg catalog. This allows you to manage Iceberg tables directly within AWS S3 Tables via RisingWave.
Follow these steps to configure this integration:
Define an Iceberg connection using the rest
catalog type. You must provide your AWS credentials and specify the necessary S3 Tables REST catalog configurations.
Required REST Catalog Parameters:
Parameter Name | Description | Value for S3 Tables |
---|---|---|
catalog.rest.signing_region | The AWS region for signing REST catalog requests. | e.g., us-east-1 |
catalog.rest.signing_name | The service name for signing REST catalog requests. | s3tables |
catalog.rest.sigv4_enabled | Enables SigV4 signing for REST catalog requests. Set to true . | true |
<...>
with your specific configuration details.Set the active connection for the Iceberg table engine. You can configure this for the current session or persist it at the system level.
With the connection configured, you can now create tables using ENGINE = iceberg
. These tables will be registered in your specified Amazon S3 Tables catalog.
RisingWave will now manage my_iceberg_table
using the Iceberg format, with metadata stored and accessible via Amazon S3 Tables.
Added in v2.4.0.
You can create an append-only table with the Iceberg engine. This option has implications on the data storage and streaming.
Iceberg’s snapshotting mechanism enables time travel queries. You can query the state of the table as of a specific timestamp or snapshot ID using the FOR SYSTEM_TIME AS OF
or FOR SYSTEM_VERSION AS OF
clauses.
Since the data is stored in the standard Iceberg format, external systems (like Spark, Trino, Dremio) can directly query the tables created by RisingWave’s Iceberg engine. To do this, configure the external system with the same Iceberg connection details used in RisingWave:
storage
, jdbc
, glue
, rest
)The namespace and table name in the external catalog will typically match the schema and table name in RisingWave. For example, the table public.users_iceberg
in RisingWave would be accessed as table users_iceberg
within the public namespace/database in the configured Iceberg catalog by an external tool.
RisingWave does not currently have a built-in automatic compaction service for tables created with the Iceberg engine. Streaming ingestion, especially with frequent commits (low commit_checkpoint_interval
), can lead to many small files. You may need to run compaction procedures manually using external tools that operate on Iceberg tables to optimize read performance. We are actively working on integrating compaction features.