RisingWave uses Hummock, a row-based LSM-tree storage engine that stores tables, materialized views, and streaming state in object storage (S3, GCS, Azure Blob). For analytics workloads requiring columnar storage, RisingWave also supports a native Iceberg table engine that stores data in the Apache Iceberg format. This page explains what data gets persisted, where it lives, and how to choose between the two options.Documentation Index
Fetch the complete documentation index at: https://docs.risingwave.com/llms.txt
Use this file to discover all available pages before exploring further.
What gets stored (and what doesn’t)
RisingWave persists data for:- Tables
- Materialized views (MVs)
CREATE TABLE ... WITH (connector=...)).
For a practical comparison, see CREATE SOURCE vs. CREATE TABLE and Source, Table, MV, and Sink.
Where the data is persisted
When you create tables or MVs, RisingWave persists their internal state in your configured object store (for example, Amazon S3). Compute nodes may cache hot data locally for performance, but the durable copy is stored in the object store.Two storage options
RisingWave offers two ways to persist data:- Row-based storage (default) via the Hummock storage engine
- Columnar storage using Apache Iceberg
Quick guide: which one should I use?
| Workload / requirement | Recommended storage |
|---|---|
| Low-latency point lookups, “latest state”, frequent updates/deletes | Row-based (Hummock) |
| Streaming pipelines with MVs that need fast incremental maintenance | Row-based (Hummock) |
| Large scans, long-range analytics, and lakehouse interoperability | Iceberg (columnar) |
| Need to query the same dataset from other engines (Spark/Trino/etc.) | Iceberg (columnar) |
Row-based storage (Hummock)
By default, tables and materialized views are stored in row-based storage using Hummock, a storage engine designed for streaming updates. Best for:- Serving up-to-date results with low latency.
- Workloads dominated by point queries and short-range scans.
- Pipelines with frequent incremental updates (CDC, upserts, streaming aggregations).
- Not optimized for very large full-table scans compared with columnar formats.
Columnar storage (Apache Iceberg)
RisingWave can store analytical datasets in Apache Iceberg, a widely adopted columnar table format in the lakehouse ecosystem. Best for:- Analytical queries that scan lots of data (reporting, dashboards, ad-hoc BI).
- Sharing the same tables with external engines that understand Iceberg.
Common patterns
- Retain raw data + compute derived results: Ingest into a table for durable retention, then build MVs for continuously maintained aggregations.
- Explore first, persist later: Start with
CREATE SOURCEfor quick exploration; switch to a connector-backed table when you need durability or performance. - Hybrid analytics: Keep operational/serving state in Hummock, and keep large analytical datasets (or shared lakehouse tables) in Iceberg.
What’s next?
- If you’re choosing between connector objects, start with CREATE SOURCE vs. CREATE TABLE.
- If you want to expose results to tools and applications, see Access overview.