RisingWave storage overview

RisingWave uses Hummock, a row-based LSM-tree storage engine that stores tables, materialized views, and streaming state in object storage (S3, GCS, Azure Blob). For analytics workloads requiring columnar storage, RisingWave also supports a native Iceberg table engine that stores data in the Apache Iceberg format. This page explains what data gets persisted, where it lives, and how to choose between the two options.

What gets stored (and what doesn’t)

RisingWave persists data for:

Tables
Materialized views (MVs)

By contrast, a Source is just a connection to an external system and does not store data inside RisingWave. If you want RisingWave to keep a durable copy of ingested data, use a connector-backed table (CREATE TABLE ... WITH (connector=...)). For a practical comparison, see CREATE SOURCE vs. CREATE TABLE and Source, Table, MV, and Sink.

Where the data is persisted

When you create tables or MVs, RisingWave persists their internal state in your configured object store (for example, Amazon S3). Compute nodes may cache hot data locally for performance, but the durable copy is stored in the object store.

Two storage options

RisingWave offers two ways to persist data:

Row-based storage (default) via the Hummock storage engine
Columnar storage using Apache Iceberg

Quick guide: which one should I use?

Workload / requirement	Recommended storage
Low-latency point lookups, “latest state”, frequent updates/deletes	Row-based (Hummock)
Streaming pipelines with MVs that need fast incremental maintenance	Row-based (Hummock)
Large scans, long-range analytics, and lakehouse interoperability	Iceberg (columnar)
Need to query the same dataset from other engines (Spark/Trino/etc.)	Iceberg (columnar)

Row-based storage (Hummock)

By default, tables and materialized views are stored in row-based storage using Hummock, a storage engine designed for streaming updates. Best for:

Serving up-to-date results with low latency.
Workloads dominated by point queries and short-range scans.
Pipelines with frequent incremental updates (CDC, upserts, streaming aggregations).

Trade-offs:

Not optimized for very large full-table scans compared with columnar formats.

Columnar storage (Apache Iceberg)

RisingWave can store analytical datasets in Apache Iceberg, a widely adopted columnar table format in the lakehouse ecosystem. Best for:

Analytical queries that scan lots of data (reporting, dashboards, ad-hoc BI).
Sharing the same tables with external engines that understand Iceberg.

To learn how Iceberg storage works in RisingWave and how to manage it, see:

Common patterns

Retain raw data + compute derived results: Ingest into a table for durable retention, then build MVs for continuously maintained aggregations.
Explore first, persist later: Start with CREATE SOURCE for quick exploration; switch to a connector-backed table when you need durability or performance.
Hybrid analytics: Keep operational/serving state in Hummock, and keep large analytical datasets (or shared lakehouse tables) in Iceberg.

What’s next?

If you’re choosing between connector objects, start with CREATE SOURCE vs. CREATE TABLE.
If you want to expose results to tools and applications, see Access overview.

​What gets stored (and what doesn’t)

​Where the data is persisted

​Two storage options

​Quick guide: which one should I use?

​Row-based storage (Hummock)

​Columnar storage (Apache Iceberg)

​Common patterns

​What’s next?