When RisingWave streams data into Iceberg, it generates many small data files and creates frequent snapshots. Over time, this can degrade query performance and increase storage costs. To maintain healthy tables, RisingWave provides both automatic and manual maintenance features for compaction and snapshot expiration.
- Compaction: Merges small data files and delete files into larger, optimized files to improve read performance.
- Snapshot Expiration: Removes old, unneeded snapshots and their associated data files to reclaim storage space.
Automatic maintenance
Version notes
- RisingWave introduced Iceberg automatic maintenance (default) in v2.5.0 and added
compaction.type with the small-files and files-with-delete compaction types in v2.7.0.
- Parameters prefixed with
compaction are currently in technical preview stage and may change in future releases.
You can enable automatic maintenance to run periodically in the background for your Iceberg sinks and internal tables.
Dedicated compactor requiredAutomatic Iceberg maintenance requires a dedicated compactor service. Please contact us via the RisingWave Slack workspace to have the necessary resources allocated for your cluster.
Compaction types
RisingWave supports three compaction types for Iceberg tables. You can specify the type using the compaction.type parameter.
| Compaction type | Description |
|---|
full | Rewrites all data files. This is the default type. |
small-files | Only compacts files smaller than a specified threshold. Use the compaction.small_files_threshold_mb parameter to set the threshold. |
files-with-delete | Only compacts data files that have associated delete files. Use the compaction.delete_files_count_threshold parameter to set the minimum number of delete files to trigger compaction. |
The small-files and files-with-delete compaction types are only supported in Merge-on-Read mode. Copy-on-Write mode only supports the full compaction type.
Parameters
Configure automatic maintenance by specifying the following parameters in the WITH clause of a CREATE SINK or CREATE TABLE ... ENGINE = iceberg statement.
General parameters
| Parameter | Description |
|---|
enable_compaction | Required. Set to true to enable automatic compaction and snapshot expiration. |
compaction_interval_sec | Optional. The interval in seconds between maintenance runs. Default: 3600. |
enable_snapshot_expiration | Optional. Set to true to enable snapshot expiration. By default, it removes snapshots older than 5 days. |
snapshot_expiration_max_age_millis | Optional. The maximum age (in milliseconds) for a snapshot to be retained. To keep only the latest snapshot, set this to 0. |
snapshot_expiration_retain_last | Optional. The minimum number of snapshots to retain, regardless of their age. |
Compaction parameters
| Parameter | Description |
|---|
compaction.type | Optional. The compaction strategy: full, small-files, or files-with-delete. Default: full. |
compaction.max_snapshots_num | Optional. The maximum number of snapshots allowed since the last rewrite operation. If set, the sink will pause if this number is exceeded until compaction completes. |
compaction.trigger_snapshot_count | Optional. The minimum number of snapshots since the last compaction required to trigger a new compaction. Both this threshold and the time interval must be met. |
compaction.target_file_size_mb | Optional. The target file size in MB for compacted files. |
compaction.small_files_threshold_mb | Optional. For small-files compaction type, the threshold size in MB below which files will be compacted. |
compaction.delete_files_count_threshold | Optional. For files-with-delete compaction type, the minimum number of delete files associated with a data file required to trigger compaction. |
Examples
Full compaction (default)
The following example enables automatic compaction with the default full compaction type:
CREATE TABLE my_iceberg_table (
id INT PRIMARY KEY,
name VARCHAR
) WITH (
enable_compaction = true,
compaction_interval_sec = 1800,
enable_snapshot_expiration = true,
snapshot_expiration_retain_last = 10
) ENGINE = iceberg;
Small files compaction
For Merge-on-Read tables with many small files, use the small-files compaction type to only compact files smaller than a threshold:
CREATE SINK my_iceberg_sink FROM my_mv
WITH (
connector = 'iceberg',
type = 'upsert',
primary_key = 'id',
warehouse.path = 's3://my-bucket/warehouse',
database.name = 'my_db',
table.name = 'my_table',
catalog.type = 'rest',
catalog.uri = 'http://rest-catalog:8181',
s3.access.key = 'your-key',
s3.secret.key = 'your-secret',
write_mode = 'merge-on-read',
-- Maintenance settings
enable_compaction = true,
compaction_interval_sec = 1800,
compaction.type = 'small-files',
compaction.small_files_threshold_mb = 128,
compaction.target_file_size_mb = 512,
enable_snapshot_expiration = true,
snapshot_expiration_retain_last = 10
);
Files with delete compaction
For Merge-on-Read tables with accumulated delete files, use the files-with-delete compaction type to only compact data files that have associated delete files:
CREATE SINK my_iceberg_sink FROM my_mv
WITH (
connector = 'iceberg',
type = 'upsert',
primary_key = 'id',
warehouse.path = 's3://my-bucket/warehouse',
database.name = 'my_db',
table.name = 'my_table',
catalog.type = 'rest',
catalog.uri = 'http://rest-catalog:8181',
s3.access.key = 'your-key',
s3.secret.key = 'your-secret',
write_mode = 'merge-on-read',
-- Maintenance settings
enable_compaction = true,
compaction_interval_sec = 1800,
compaction.type = 'files-with-delete',
compaction.delete_files_count_threshold = 100,
compaction.trigger_snapshot_count = 10,
enable_snapshot_expiration = true,
snapshot_expiration_retain_last = 10
);
Manual maintenance
In addition to automatic background maintenance, you can trigger compaction and snapshot expiration manually at any time using the VACUUM command.
This gives you on-demand control over table optimization and storage cleanup.
-- Compact small files for a table
VACUUM my_iceberg_table;
-- Compact files and expire snapshots older than 2 days
VACUUM my_iceberg_table RETAIN 2 DAYS;