Skip to main content
To build real-time data applications in RisingWave, you must understand four core objects: Table, Source, Materialized View (MV), and Sink. They represent storage, data ingress, real-time computation, and data egress. Common beginner questions include: Both Table and Source can connect to external systems—what’s the difference? Why are Tables required for CDC? Must a Sink consume from an MV? Can Iceberg or Kafka Sources be queried directly? This guide clarifies these concepts with comparison tables, examples, and diagrams.

High-level comparison

To quickly grasp the differences, the following table illustrates how these four objects vary in terms of storage, queryability, and typical use cases.
ObjectPersist Data?Queryable?When To Use?
TablePersisting data, querying, serving as input for MVs or Sinks
SourceDepends on the connector (see below)Data ingress, ad-hoc exploration, building Tables or MVs
MVReal-time aggregation, transformation, analysis
SinkWriting results back to Kafka, databases, or object storage
We can see that both Tables and MVs store data within RisingWave. The key difference is that a Table holds the current state of the data (which can be populated internally or from an external source), whereas an MV is the persisted result of a computation. A Source acts as a data entry point and does not store data, making it suitable for exploration and validation. A Sink does not retain data inside RisingWave but continuously outputs results. RisingWave Objects Relationship

Connector support matrix

Users often ask: “Can I query this Source directly? Do I need to create a Table for CDC connectors?” The matrix below shows which connectors support direct Source queries and which require creating a Table.
Connector TypeDirectly Query Source?Primary Key Required?
Kafka / Pulsar / Kinesis / NATS / MQTT / PubSub
S3 / GCS / Azure Blob
PostgreSQL CDC
MySQL CDC
SQL Server CDC
MongoDB CDC
Iceberg❌ (Unless declared for writes)
Sources like Kafka, S3, and Iceberg can be queried directly, while CDC Sources must be materialized into a Table. CDC Tables must have a primary key defined, whereas Kafka/S3 Tables do not.

Table and Source

Table: the foundation of storage in RisingWave

A Table in RisingWave is a persistent data object that stores a collection of rows, behaving much like a traditional database table. It can store, query, and update data. There are two ways to use a Table: First, as an internal table where users manually insert data.
CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR);
INSERT INTO users VALUES (1, 'Alice');
Second, as a connector-backed table that is bound to an external data source via a specified connector (e.g., Kafka, Postgres CDC, Iceberg).
CREATE TABLE events (
  user_id INT,
  action VARCHAR,
  ts TIMESTAMP
) WITH (
  connector = 'kafka',
  topic = 'user_events',
  properties.bootstrap.server = 'broker:9092'
);
Regardless of the method, a Table stores data internally within RisingWave and maintains offsets/checkpoints to ensure fault tolerance and data consistency. Therefore, for any pipeline involving CDC or long-running jobs, a Table is indispensable.

Source: a lightweight ingress definition

Unlike a Table, a Source only defines how to connect to an external system and does not store any data within RisingWave. For Sources like Kafka, S3, and Iceberg, you can execute SELECT queries directly:
CREATE SOURCE kafka_src (
  user_id INT,
  action VARCHAR,
  ts TIMESTAMP
) WITH (
  connector = 'kafka',
  topic = 'user_events',
  properties.bootstrap.server = 'broker:9092'
);

SELECT * FROM kafka_src LIMIT 10;
However, CDC Sources (e.g., Postgres/MySQL CDC) cannot be queried directly. This is because CDC streams contain update and delete operations, which must first be materialized into a Table to correctly apply these changes. The advantage of a Source lies in its lightweight and flexible nature, making it ideal for ad-hoc exploration or as an input for an MV. It’s important to note, however, that a Sink’s upstream cannot be a Source directly.

CDC always requires creating a table

What makes a CDC (Change Data Capture) stream special is that it includes updates and deletes. RisingWave must rely on a Table to process these events correctly. If a user executes UPDATE users SET name='Bob' WHERE id=1 in PostgreSQL, RisingWave needs to know which row to update. Only a Table, which stores the data’s state and has a primary key, can support this semantic. Furthermore, a CDC stream is a continuous log of changes. If it were just a Source, consistency could not be restored after a task interruption. A Table, on the other hand, can resume from the last checkpoint thanks to its persistence and offset tracking. Finally, a single database transaction might update multiple tables. RisingWave relies on Tables to correctly apply transactional boundaries. Therefore, for CDC, you must create a table. You can either create a CDC table directly or create a Source first and then derive a Table from it.

Shared Sources and consistency

For PostgreSQL CDC, MySQL CDC, and SQL Server CDC scenarios, RisingWave provides a shared source mechanism. A user can create a single CDC Source and then derive multiple Tables from it:
CREATE SOURCE pg_src WITH (
  connector = 'postgres-cdc',
  hostname = '127.0.0.1',
  port = '5432',
  username = 'user',
  password = 'pwd',
  database.name = 'mydb'
);

CREATE TABLE users (...) FROM pg_src TABLE 'public.users';
CREATE TABLE orders (...) FROM pg_src TABLE 'public.orders';
CDC Handling in RisingWave The benefits of this approach are reduced configuration duplication and guaranteed cross-table transactional consistency, as all tables consume the same change log stream. If each table established a separate connection, the atomicity of cross-table transactions could be lost. Since v2.1, Kafka Sources also support the shared source feature. Unlike CDC’s shared source, which is used for transactional consistency, Kafka’s shared source primarily aims to avoid redundant data consumption by allowing multiple downstream consumers to share a single SourceExecutor, thereby improving efficiency.

Materialized View

The Materialized View (MV) is the core of real-time computation in RisingWave. It is defined by a SQL query, and the system automatically maintains the result table, refreshing it in real-time as the underlying data updates.
CREATE MATERIALIZED VIEW daily_sales AS
SELECT date_trunc('day', ts) d, SUM(amount) total
FROM sales
GROUP BY d;
MVs provide low-latency queries and allow for the reuse of computation results, making them especially suitable for real-time aggregation, data cleaning, and analysis. In most cases, the upstream of a Sink will be an MV.

Sink

A sink is the egress point of RisingWave, used to write data to external systems. The input for a sink can be:
  • An existing object such as a source, table, or materialized view, which is defined by the FROM clause in a CREATE SINK statement.
  • A query, which is defined by CREATE SINK AS SELECT <select_query> syntax.
If you want to write results back into an internal RisingWave table, you can use the CREATE SINK INTO command. A common use case for this is to union multiple sources and write them into a single table.

Summary

Here’s a quick recap of the four core objects in RisingWave:
  • Table: The storage layer. Mandatory for CDC scenarios. Stores the current state of data.
  • Source: A lightweight ingress point. Non-CDC Sources can be queried directly, while CDC Sources must be materialized into a Table.
  • MV: The result table of real-time computation. Recommended as the upstream for a Sink.
  • Sink: The output layer. Supports three creation syntaxes: FROM, AS SELECT, and INTO.