Skip to main content
This quickstart shows how to use an externally managed Iceberg table with AWS Glue Data Catalog:
  • Create the Iceberg table outside RisingWave (using Amazon Athena)
  • Deliver data to the table with CREATE SINK
  • Read data from the same table with CREATE SOURCE
All steps are runnable with AWS CLI + RisingWave SQL.

Prerequisites

  • A running RisingWave cluster (self-hosted or RisingWave Cloud) and access to run SQL.
  • AWS CLI configured with credentials (AK/SK).
  • Access to AWS Glue, S3, and Athena in a region where Athena engine v3 is available.

Step 1: Create an S3 bucket and Glue database

export AWS_REGION=us-west-2
export ICEBERG_BUCKET=my-iceberg-demo-bucket
export GLUE_DB=rw_quickstart

aws s3 mb "s3://${ICEBERG_BUCKET}" --region "${AWS_REGION}"

# Athena needs an output location for query results
export ATHENA_OUTPUT="s3://${ICEBERG_BUCKET}/athena-results/"

aws glue create-database \
  --region "${AWS_REGION}" \
  --database-input "{\"Name\":\"${GLUE_DB}\"}"

Step 2: Create an Iceberg table in Glue (via Athena)

Create the table:
aws athena start-query-execution \
  --region "${AWS_REGION}" \
  --query-string "CREATE TABLE ${GLUE_DB}.rw_events (id int, event_type string, event_ts timestamp) LOCATION 's3://${ICEBERG_BUCKET}/warehouse/${GLUE_DB}/rw_events/' TBLPROPERTIES ('table_type'='ICEBERG')" \
  --result-configuration "OutputLocation=${ATHENA_OUTPUT}"
Optionally insert some rows (so you can see data immediately when reading in RisingWave):
aws athena start-query-execution \
  --region "${AWS_REGION}" \
  --query-string "INSERT INTO ${GLUE_DB}.rw_events VALUES (100,'seed','2026-01-01 00:00:00'),(101,'seed','2026-01-01 00:01:00')" \
  --result-configuration "OutputLocation=${ATHENA_OUTPUT}"
For production usage, you should check the Athena query execution status before continuing. For brevity, this quickstart omits polling.

Step 3: Deliver data to the Glue-managed Iceberg table (RisingWave)

CREATE TABLE local_events (
  id INT,
  event_type VARCHAR,
  event_ts TIMESTAMP
);

CREATE SINK to_glue_events FROM local_events
WITH (
  connector = 'iceberg',
  type = 'append-only',

  warehouse.path = 's3://my-iceberg-demo-bucket/warehouse/',
  s3.region = 'us-west-2',
  s3.access.key = 'YOUR_AWS_ACCESS_KEY_ID',
  s3.secret.key = 'YOUR_AWS_SECRET_ACCESS_KEY',
  enable_config_load = false,

  catalog.type = 'glue',
  glue.region = 'us-west-2',

  database.name = 'rw_quickstart',
  table.name = 'rw_events',

  commit_checkpoint_interval = 1
);
Insert some rows (the sink will deliver them to the Iceberg table):
INSERT INTO local_events VALUES
  (1, 'login',  '2026-01-01 10:00:00'),
  (2, 'click',  '2026-01-01 10:01:00'),
  (3, 'logout', '2026-01-01 10:02:00');

Step 4: Read the same table back as an Iceberg source

CREATE SOURCE glue_events_src
WITH (
  connector = 'iceberg',
  warehouse.path = 's3://my-iceberg-demo-bucket/warehouse/',
  database.name = 'rw_quickstart',
  table.name = 'rw_events',

  catalog.type = 'glue',
  s3.region = 'us-west-2',
  s3.access.key = 'YOUR_AWS_ACCESS_KEY_ID',
  s3.secret.key = 'YOUR_AWS_SECRET_ACCESS_KEY',
  enable_config_load = false
);

SELECT * FROM glue_events_src ORDER BY event_ts;

What you just built

  • An Iceberg table whose metadata is managed by AWS Glue and data lives on S3.
  • RisingWave acting as both a writer (SINK) and a reader (SOURCE) through the Iceberg connector.
For reference, see AWS Glue catalog, Deliver data to Iceberg tables, and Ingest data from Iceberg tables.