- Continuous ingestion (default): Create an Iceberg source with the
CREATE SOURCEstatement for continuous, streaming ingestion of append-only data. - Periodic full reload: Create an Iceberg table with
refresh_mode = 'FULL_RELOAD'for scheduled full table refreshes. Note that you must useCREATE TABLE(notCREATE SOURCE), and data will only be loaded after you trigger a refresh—either manually or via the configured schedule.
Prerequisites
- An existing Apache Iceberg table managed by external systems.
- Access credentials for the underlying storage system (e.g., S3 access key and secret key).
- Network connectivity between RisingWave and your storage system.
- Knowledge of your Iceberg catalog type and configuration.
Continuous ingestion with CREATE SOURCE
Limitations
- PRIMARY KEY is not supported. Iceberg
CREATE SOURCEis for append-only, streaming ingestion. RisingWave internally uses a hidden, system-generated_row_idcolumn to uniquely identify each record, which is incompatible with user-defined primary keys. If you need to ingest a mutable Iceberg table (with updates and deletes), useCREATE TABLEwithrefresh_mode = 'FULL_RELOAD'instead — note thatFULL_RELOADis periodic snapshot-style ingestion, not continuous streaming. - Column definitions are not allowed for Iceberg sources. RisingWave automatically infers the schema from the Iceberg table metadata. Specifying column definitions in the
CREATE SOURCEstatement will result in an error. UseCREATE SOURCE <name> WITH (...)rather thanCREATE SOURCE <name> (<columns>) WITH (...).
Basic connection example
The following example creates a source for a table in S3 using AWS Glue as the catalog:Parameters
| Parameter | Description | Example |
|---|---|---|
connector | Required. For Iceberg sources, it must be 'iceberg' | 'iceberg' |
database.name | Required. The Iceberg database/namespace name. | 'analytics' |
table.name | Required. The Iceberg table name. | 'user_events' |
commit_checkpoint_interval | Optional. Determines the Iceberg commit frequency. Default: 60 (about 60 seconds in the default configuration). | 60 |
CREATE SOURCE statement. Because these parameters are shared across all Iceberg objects—sources, sinks, and internal Iceberg tables—they are documented separately.
- Object storage: Object storage configuration
- Catalogs: Catalog configuration
Source example
For a REST catalog:Periodic full reload with CREATE TABLE
Added in v2.7.0. It is currently in technical preview stage.
refresh_mode = 'FULL_RELOAD'. This mode is useful when:
- The external Iceberg table supports mutable data (updates and deletes).
- You need a point-in-time snapshot of the entire table.
- You want to apply Iceberg deletes (PositionDeletes and EqualityDeletes) for accurate query results.
- Periodic full reloads fit your use case better than continuous streaming.
Create a refreshable table
Do not define column types in the
CREATE TABLE statement. RisingWave automatically infers the schema from the Iceberg table. Only specify a PRIMARY KEY constraint if the corresponding column exists in the Iceberg table schema.Parameters
| Parameter | Description | Required | Example |
|---|---|---|---|
refresh_mode | Must be set to 'FULL_RELOAD' to enable periodic refresh functionality | Yes | 'FULL_RELOAD' |
refresh_interval_sec | Interval in seconds between automatic refresh operations | No | '60' |
stream_refresh_scheduler_interval_sec in the RisingWave configuration file). Setting a refresh_interval_sec value lower than this scheduler interval may result in refresh triggers not occurring at the expected frequency.
If you omit
refresh_interval_sec, the table will only refresh when you manually execute REFRESH TABLE, giving you complete control over when data is loaded.Manual refresh
You can manually trigger a refresh at any time using theREFRESH TABLE command:
Monitor delete files
You can verify discovered delete files via therw_iceberg_files system catalog:
content column indicates the file type:
Data: Regular data filesPositionDeletes: Position-based delete filesEqualityDeletes: Equality-based delete files
Monitor refresh status
Query therw_catalog.rw_refresh_table_state system catalog to monitor refresh operations:
current_status field shows the current state of the refresh job:
IDLE: No refresh operation is currently in progressREFRESHING: A refresh operation is in progress