This guide shows how to sink data from RisingWave into an Iceberg table backed by AWS Glue (catalog) and Amazon S3 (warehouse), and then query the table using Databricks.

Prerequisites

Before you begin, make sure you have:
  • A running RisingWave cluster.
  • (Optional) An Iceberg compactor if you plan to sink upsert streams. Contact our support team or sales team if you need this.
  • A Databricks cluster.
  • An Amazon S3 bucket.
  • AWS Glue.

Iceberg catalog and warehouse

The Iceberg catalog should be AWS Glue. As for warehouse, we recommended using AWS S3.

Sink data from RisingWave into Iceberg

Follow the instruction to create a sink to sink your data into Iceberg table. Below are two examples.
Glue + S3 (append-only)
CREATE SINK glue_sink FROM my_data
WITH (
    connector = 'iceberg',
    type = 'append-only',
    warehouse.path = 's3://my-bucket/warehouse',
    database.name = 'my_database',
    table.name = 'my_table',
    catalog.type = 'glue',
    catalog.name = 'my_catalog',
    s3.access.key = 'your-access-key',
    s3.secret.key = 'your-secret-key',
    s3.region = 'us-west-2'
);
Glue + S3 (upsert)
CREATE SINK glue_sink FROM my_data
WITH (
    connector = 'iceberg',
    type = 'append-only',
    warehouse.path = 's3://my-bucket/warehouse',
    database.name = 'my_database',
    table.name = 'my_table',
    catalog.type = 'glue',
    catalog.name = 'my_catalog',
    s3.access.key = 'your-access-key',
    s3.secret.key = 'your-secret-key',
    s3.region = 'us-west-2',
    write_mode = 'copy-on-write',
    enable_compaction = true,
    compaction_interval_sec = 300
);
For upsert type, since Databricks doesn’t support reading position delete and equality delete files, please use Copy-on-Write mode write_mode = 'copy-on-write' and enable the Iceberg compaction as well. The compaction_interval_sec determines the freshness of the Iceberg table, since Copy-on-Write mode relies on the Iceberg compaction.

Query Iceberg table in Databricks

Follow Unity catalog Lakehouse federation to query Iceberg data from AWS Glue. Once configured, you can directly query the Iceberg table from Databricks.