Skip to main content
This glossary defines the key concepts and terminology used throughout the RisingWave documentation. RisingWave is a PostgreSQL-compatible streaming database — understanding these terms will help you navigate the documentation and use the system effectively.

Avro

Avro is an open-source data serialization system that facilitates data exchange between systems, programming languages, and processing frameworks. Avro has a JSON-like data model, but it can be represented as either JSON or in a compact binary form. RisingWave can decode Avro data. You need to specify the schema by providing a schema registry URL (only for Kafka topics).

Change data capture (CDC)

Change data capture refers to the process of identifying and capturing changes as they are made in a database or source application, then delivering those changes in real time to a downstream process, system, or data lake. RisingWave supports ingesting CDC events in Debezium JSON or Maxwell JSON from Kafka topics. For a deep dive, see What is CDC in RisingWave?.

Clusters

A group of interconnected nodes and services that act as a single system running an instance of RisingWave. A typical cluster includes Serving Nodes (query frontend and batch execution), Streaming Nodes (streaming ingestion and incremental computation), Compactor Nodes (Hummock storage compaction), and one or more Meta Nodes (cluster coordination and metadata management). In distributed deployments, each node type scales independently based on workload characteristics.

Compute Nodes

A Compute Node is the core processing unit of a RisingWave cluster, running as the risingwave compute-node process. It can operate in one of three roles:
  • hybrid (default): handles both streaming ingestion and batch query execution in the same process.
  • streaming: dedicated to stream processing only. A Compute Node in this role is called a Streaming Node.
  • serving: dedicated to batch query execution only. A Compute Node in this role is called a Serving Node.
In small or development deployments a single hybrid Compute Node handles everything. In production you can assign separate Compute Nodes to the streaming and serving roles to prevent stream processing and ad-hoc queries from competing for the same resources. See Dedicated Compute Nodes.

Compactor Nodes

A Compactor Node handles data storage and retrieval from object storage. It also performs data compaction to optimize storage efficiency.

Connection

A connection allows access to services located outside of your VPC. AWS PrivateLink provides a network connection used to create a private connection between VPCs, private networks, and other services. In RisingWave, the CREATE CONNECTION command establishes a connection between RisingWave and an external service. Then, a source or sink can be created to receive or send messages.

Data persistence

Data persistence means that data survives after the process that generated the data has ended. For a database to be considered persistent, it must write to non-volatile storage. This type of storage is able to retain data in the absence of a power supply. To learn about how data is persisted in RisingWave, see Data persistence.

Debezium

Debezium is an open-source distributed platform for change data capture (CDC). It converts change records from existing databases into event streams in the form of Kafka topics. Debezium provides a unified format schema for changelog and supports serializing messages in JSON and Apache Avro.

Fragments

In RisingWave, when a streaming query plan executes, it divides into multiple independent fragments to allow parallel execution. Each fragment is a chain of SQL operators. Under the hood, it is executed by parallel actors. The degree of parallelism between fragments can be different. You can control fragment-level parallelism using the ALTER FRAGMENT command for fine-grained performance tuning.

Indexes

Indexes in a database are typically created on one or more columns of a table, allowing the database management system (DBMS) to locate and retrieve the desired data from the table quickly. This can significantly improve the performance of database queries, especially for large tables or frequently accessed tables. In RisingWave, indexes can speed up batch queries.

Materialized Views

When the results of a view expression are stored in a database system, they are called materialized views. In RisingWave, the result of a materialized view is updated when a relevant event arrives in the system. When you query the result, it is returned instantly as the computation has already been completed when the data comes in. You need to use the CREATE MATERIALIZED VIEW statement to create a materialized view. For a deep dive, see What is a Materialized View in RisingWave?.

Meta Nodes

The Meta Node is the control plane of a RisingWave cluster. It manages cluster-wide metadata: tracking which streaming fragments run on which Compute Nodes, scheduling checkpoints, handling DDL operations such as creating and dropping streaming jobs, and coordinating the barrier injection that drives consistent snapshot isolation across the streaming graph. A cluster typically has one active Meta Node, with a standby for high availability.

Nodes

A node is a logical collection of IT resources that handles specific workloads based on their types. RisingWave has the following types of nodes:
  • Compute Nodes (running as Streaming Nodes, Serving Nodes, or hybrid)
  • Compactor Nodes
  • Meta Nodes

Object storage

Object storage, or object-based storage, is a technology that stores data in a hierarchy-free manner. Data in object storage exists as discrete units (objects) at the same level in a storage pool. Each object has a unique, identifying name that an application uses to retrieve it. The benefits of using object storage include massive scalability and cost efficiency.

Parallelism

Parallelism refers to the technique of simultaneously executing multiple database operations or queries to improve performance and increase efficiency. It involves dividing a database workload into smaller tasks and executing them concurrently on multiple processors or machines. In RisingWave, you can set the parallelism of streaming jobs, like tables, materialized views, and sinks.

Protobuf

Protocol buffers (commonly known as Protobuf) are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. It is similar to XML, but smaller, faster, and simpler. RisingWave supports decoding Protobuf data. When creating a source that uses the Protobuf format, you need to specify the schema. For details about the requirements, see Protobuf requirements.

psql

psql is a terminal-based front-end to PostgreSQL and other databases that are compatible with the PostgreSQL wire protocol, such as RisingWave. With psql, you can type queries interactively, issue these queries to RisingWave, and see the query results. In addition, psql provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.

risedev

risedev is a command-line utility that facilitates the development of RisingWave. It is built on cargo-make, a crucial tool for building Rust projects. risedev is included in RisingWave’s source code. Additionally, risectl is integrated into risedev as a subcommand: risedev ctl.

risectl

risectl is a command-line tool for managing the RisingWave kernel. It allows you to inspect the cluster status and access low-level APIs for cluster control. This tool is included with each version release. Please use it at your own risk. Run risectl help for more details. RisingWave Console For a web-based interface to manage and monitor your RisingWave clusters, you can use the RisingWave Console, which provides a user-friendly alternative to risectl commands with additional features like diagnostic collection and automated monitoring.

risingwave

risingwave is the standalone binary for the RisingWave database. You can initiate different nodes using subcommands; for instance, risingwave compute-node launches the Compute Node. Run risingwave help for more details. risectl is also incorporated into risingwave as a subcommand, risingwave ctl.

rwc

rwc (RisingWave Cloud CLI) is a command-line tool for accessing and managing RisingWave Cloud/BYOC instances via open APIs. It is particularly useful for setting up the BYOC cluster. For installation and setup instructions, see Install the RisingWave Cloud CLI.

Serialization

In stream processing, serialization is the process of converting business objects into bytes so that they can be easily saved or transmitted. The reverse process, recreating the data structure or object from a stream of bytes, is called deserialization. Common data serialization formats include JSON, Avro, Protobuf (protocol buffers), and CSV.

Serving Nodes

A Serving Node is a Compute Node running with the serving role. It has the following responsibilities:
  • Acts as a frontend: It handles user requests and is compatible with the PostgreSQL wire protocol, allowing tools like psql to connect seamlessly.
  • Processes queries: It executes batch queries directly. For streaming queries, it generates an execution plan and dispatches it to the stream engine.

Batch query execution

When executing batch queries (ad-hoc SELECT statements), RisingWave uses a set of specialized executors to process data:
  • Scan executors: Read data from tables, materialized views, or external sources (including Iceberg, S3, PostgreSQL, MySQL)
  • Join executors: Perform hash joins, nested loop joins, and lookup joins
  • Aggregation executors: Compute aggregations using hash-based or sort-based algorithms
  • Sort and limit executors: Order results and apply pagination
  • Project and filter executors: Transform and filter data
Batch queries read from consistent snapshots of data stored in the Hummock storage engine. Queries can be executed in three modes:
  • distributed: Parallel execution across multiple nodes (default)
  • local: Single-node execution for simple queries
  • auto: System-determined mode based on query characteristics
For complex analytical workloads, RisingWave supports the full TPC-H benchmark suite, demonstrating support for multi-way joins, complex aggregations, subqueries, and window functions.

Sinks

A sink is an external target to which RisingWave continuously delivers processed data. Supported destinations include Apache Kafka, Apache Iceberg, relational databases (PostgreSQL, MySQL, TiDB), object storage (S3, GCS, Azure Blob), Elasticsearch, ClickHouse, and more. A sink consumes from a source, table, materialized view, or an inline query, and writes results downstream as they are produced. Before streaming data out, you create a sink using the CREATE SINK statement. For a deep dive, see What is a Sink in RisingWave?.

Sources

A source is a resource that RisingWave can read data from. Common sources include message brokers such as Apache Kafka and Apache Pulsar and databases such as MySQL and PostgreSQL. You can create a source in RisingWave using the CREATE SOURCE command. If you want to persist the data from the source, you should use the CREATE TABLE command with connector settings. Regardless of whether the data is persisted in RisingWave, you can create materialized views to perform data transformations. For a deep dive, see What is a Source in RisingWave?.

Stream processing

Stream processing is the processing of data in motion, or in other words, computing on data directly as it is produced or received. The majority of data are born as continuous streams: sensor events, user activity on a website, financial trades, and so on – all these data are created as a series of events over time. For a deep dive, see What is Stream Processing?.

Streaming actors

RisingWave distributes its computation across lightweight threads called “streaming actors,” which run simultaneously on CPU cores. By spreading these streaming actors across cores, RisingWave achieves parallel computation, resulting in improved performance, scalability, and throughput.

Session variables and optimizer tuning

RisingWave provides session-level configuration variables to control query planning and execution behavior:
  • streaming_force_filter_inside_join: Push filters into join operators in streaming queries
  • streaming_enable_unaligned_join: Enable log store buffers after high-amplification joins
  • streaming_use_snapshot_backfill: Use snapshot isolation for MV/sink backfill phase
  • streaming_use_arrangement_backfill: Use arrangement-based backfill strategy
  • streaming_allow_jsonb_in_stream_key: Allow JSONB columns in stream keys (impacts performance)
Use SET <variable> = <value>; to modify these settings. Most are disabled by default for safety/performance reasons.

Streaming database

A streaming database is broadly defined as a data store designed to collect, process, and/or enrich streams of data in real time, typically immediately after the data is created. For a deep dive, see What is a Streaming Database?.

Streaming Nodes

A Streaming Node is a Compute Node running with the streaming role. It is responsible for ingesting data from upstream systems, executing the streaming query graph incrementally, and delivering results to downstream systems. Each Streaming Node runs a set of streaming actors that process partitions of the data stream in parallel. All durable state is stored in the Hummock storage engine backed by object storage, so Streaming Nodes can be scaled out or in without data migration.

Streaming queries

A streaming query, also known as a streaming job, is a SQL query that operates on data that is continuously generated. In RisingWave, the following SQL statements are considered streaming queries: CREATE SOURCE, CREATE TABLE (with connector settings), CREATE MATERIALIZED VIEW, CREATE INDEX, and CREATE SINK.

Views

A view is a virtual relation that acts as an actual relation. It is not a part of the logical relational model of the database system. The query expression of the view is stored in the database system. The results of a non-materialized view are not stored in the database system and are calculated every time the view is accessed.

Wire protocol

A wire protocol is a format for interactions between a database server and its clients. It consists of authentication, sending queries, and receiving responses. The wire protocol for PostgreSQL is called pgwire. If a tool or database is compatible with pgwire, it can work with most PostgreSQL database tools.