Skip to main content
Apache Iceberg is an open table format that provides reliable, high-performance storage for data lakes. RisingWave integrates with Iceberg to combine real-time computation with open data lake storage. You can work with external Iceberg tables managed by other systems or create internal Iceberg tables managed directly by RisingWave.

External Iceberg tables

External Iceberg tables are managed outside of RisingWave, such as S3 Tables, Snowflake-managed Iceberg tables, Databricks-managed Iceberg tables, or self-managed Iceberg deployments. RisingWave connects to these tables through their catalogs and treats them as data sources or data sinks. image.png

Reading from external Iceberg tables

RisingWave can continuously read data from append-only Iceberg tables. It monitors snapshots and automatically loads newly appended data, allowing you to consume the table as a live data stream. Example
CREATE SOURCE orders_src
WITH (
  connector = 'iceberg',
  warehouse.path = 's3://lakehouse/warehouse',
  database.name = 'sales',
  table.name = 'orders',
  catalog.type = 'glue',
  s3.region = 'us-west-2'
);

Ad hoc analytics on Iceberg data

After the source is created, you can query it directly with SQL. RisingWave retrieves the current snapshot of the Iceberg table at query time.
-- Inspect recent orders for a specific product
SELECT order_id, user_id, amount, order_ts
FROM orders_src
WHERE product_id = 12345
ORDER BY order_ts DESC
LIMIT 20;

-- Aggregate daily revenue over the past week
SELECT date_trunc('day', order_ts) AS day, SUM(amount) AS revenue
FROM orders_src
WHERE order_ts >= now() - interval '7 days'
GROUP BY 1
ORDER BY day;
These queries run instantly on the latest Iceberg snapshot, making RisingWave useful for interactive analytics without setting up a separate ETL process.

Continuous analytics with materialized views

For real-time, incremental analytics, create a materialized view on the Iceberg source. RisingWave automatically keeps the view up to date as new snapshots are committed to the Iceberg table.
CREATE MATERIALIZED VIEW mv_daily_sales AS
SELECT
  date_trunc('day', order_ts) AS day,
  SUM(amount) AS total_sales,
  COUNT(DISTINCT user_id) AS active_users
FROM orders_src
GROUP BY 1;
The materialized view keeps results up to date as RisingWave ingests new Iceberg snapshots. You can query the view directly:
SELECT * FROM mv_daily_sales ORDER BY day DESC LIMIT 10;

This approach provides low-latency analytics on Iceberg data while maintaining compatibility with the underlying catalog and storage system.

Writing to external Iceberg tables

RisingWave can also write query results or materialized view outputs to Iceberg tables. The resulting data remains fully compatible with other Iceberg engines such as Spark, Trino, and DuckDB.
  • Supports append-only, upsert, and force-append-only data modes
  • Guarantees exactly-once delivery
  • Can perform optional file compaction for efficiency
Example
CREATE SINK daily_sales_sink FROM mv_daily_sales
WITH (
  connector = 'iceberg',
  type = 'append-only',
  warehouse.path = 's3://lakehouse/warehouse',
  database.name = 'sales',
  table.name = 'daily_sales',
  catalog.type = 'rest',
  catalog.uri = 'http://lakekeeper:8181',
  enable_compaction = true
);

With this configuration, RisingWave acts as a real-time transformation layer between streaming systems and Iceberg storage.

Internal Iceberg tables

image.png Internal Iceberg tables are created and managed directly by RisingWave. They behave like standard RisingWave tables but store their data in Iceberg format on object storage. This allows you to persist computed or aggregated results in an open format that can be accessed by other query engines.

Creating internal Iceberg tables

You can create internal Iceberg tables using standard SQL syntax.
CREATE TABLE fact_orders (
  order_id BIGINT,
  user_id BIGINT,
  amount DOUBLE,
  ts TIMESTAMP
) ENGINE = iceberg;
RisingWave automatically manages schema, metadata, and data persistence. Data is stored in Parquet format and can be queried by any Iceberg-compatible engine such as Spark, Trino, or DuckDB. You can query, join, and build materialized views on internal tables just like any other RisingWave table.

Catalog service

RisingWave provides two hosted catalog options for managing Iceberg metadata, schema versions, and table state:
  • JDBC hosted catalog — backed by RisingWave’s internal PostgreSQL-compatible metastore. See JDBC hosted catalog
  • REST hosted catalog — powered by Lakekeeper and compatible with the Iceberg REST catalog API. See REST hosted catalog
Both options allow external Iceberg engines to read and write RisingWave-managed tables using standard Iceberg protocols. If you prefer to use an existing metadata system, RisingWave also supports external catalogs such as AWS Glue, Hive Metastore, or Nessie. For details, see Catalog configuration.

Compaction service

RisingWave provides a built-in compaction service that automatically merges small Parquet files, expires outdated snapshots, and maintains efficient file layouts. This ensures good query performance and stable storage usage during continuous ingestion. Using RisingWave’s compaction service is optional. You can also connect an external compactor such as Tabular’s Iceberg compactor, Databricks-managed compaction, Amazon EMR, or a self-hosted Spark job. When using an external compactor, RisingWave writes data in a compaction-friendly format that allows other systems to safely perform maintenance. Compaction can be enabled or disabled per table or configured globally depending on performance and cost requirements.

Catalog and compaction summary

ComponentDefault OptionAlternative OptionsDescription
Catalog serviceRisingWave built-in REST catalogGlue, Hive, Nessie, or custom REST catalogsStores metadata and schema information
Compaction serviceRisingWave built-in compactorAmazon EMR, or self-hosted SparkOptimizes file layout and merges small Parquet files

Typical architecture

[ Kafka / CDC / APIs ]
          |
          v
    RisingWave SQL Engine
   ├─ Reads from external Iceberg
   ├─ Performs real-time computation
   ├─ Builds materialized views
   └─ Writes results to internal or external Iceberg
          |
          v
[ Iceberg Tables in Object Storage ]

RisingWave connects streaming systems with Iceberg-based data lakes. Use external Iceberg tables to analyze or enrich existing datasets, and internal Iceberg tables to persist computed results in an open, queryable format.

Summary

CapabilityExternal Iceberg TablesInternal Iceberg Tables
Read supportContinuous and ad-hoc queriesSupported
Write supportAppend, upsert, or force-append-onlyFully managed by RisingWave
Catalog ownershipExternal systemRisingWave or external catalog
CompactionOptional via sink or external compactorOptional via RisingWave or external compactor
InteroperabilityCompatible with other Iceberg enginesCompatible with Iceberg standard
Typical useConnect to existing Iceberg dataPersist computed or aggregated data

Choosing between external and internal tables

  • Use external Iceberg tables if you already have an Iceberg environment such as S3 Tables, Snowflake, or Databricks, and want RisingWave to process or update that data.
  • Use internal Iceberg tables if you want RisingWave to handle both computation and Iceberg data management with its built-in catalog and compaction services.
  • Combine both approaches to build a unified, real-time lakehouse architecture.

Next steps

  • Read from Iceberg tables
  • Write to Iceberg tables
  • Create and manage internal Iceberg tables
  • Configure catalogs and compaction services
I