Source, Table, MV, and Sink

To build real-time data applications in RisingWave, you must understand four core objects: Table, Source, Materialized View (MV), and Sink. They represent storage, data ingress, real-time computation, and data egress. Common beginner questions include: Both Table and Source can connect to external systems—what’s the difference? Why are Tables required for CDC? Must a Sink consume from an MV? Can Iceberg or Kafka Sources be queried directly? This guide clarifies these concepts with comparison tables, examples, and diagrams.

High-level comparison

To quickly grasp the differences, the following table illustrates how these four objects vary in terms of storage, queryability, and typical use cases.

Object	Persist Data?	Queryable?	When To Use?
Table	✅	✅	Persisting data, querying, serving as input for MVs or Sinks
Source	❌	Depends on the connector (see below)	Data ingress, ad-hoc exploration, building Tables or MVs
MV	✅	✅	Real-time aggregation, transformation, analysis
Sink	❌	❌	Writing results back to Kafka, databases, or object storage

We can see that both Tables and MVs store data within RisingWave. The key difference is that a Table holds the current state of the data (which can be populated internally or from an external source), whereas an MV is the persisted result of a computation. A Source acts as a data entry point and does not store data, making it suitable for exploration and validation. A Sink does not retain data inside RisingWave but continuously outputs results.

Connector support matrix

Users often ask: “Can I query this Source directly? Do I need to create a Table for CDC connectors?” The matrix below shows which connectors support direct Source queries and which require creating a Table.

Connector Type	Directly Query Source?	Primary Key Required?
Kafka / Pulsar / Kinesis / NATS / MQTT / PubSub	✅	❌
S3 / GCS / Azure Blob	✅	❌
PostgreSQL CDC	❌	✅
MySQL CDC	❌	✅
SQL Server CDC	❌	✅
MongoDB CDC	❌	✅
Iceberg	✅	❌ (Unless declared for writes)

Sources like Kafka, S3, and Iceberg can be queried directly, while CDC Sources must be materialized into a Table. CDC Tables must have a primary key defined, whereas Kafka/S3 Tables do not.

Table and Source

Table: the foundation of storage in RisingWave

A Table in RisingWave is a persistent data object that stores a collection of rows, behaving much like a traditional database table. It can store, query, and update data. There are two ways to use a Table: First, as an internal table where users manually insert data.

CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR);
INSERT INTO users VALUES (1, 'Alice');

Second, as a connector-backed table that is bound to an external data source via a specified connector (e.g., Kafka, Postgres CDC, Iceberg).

CREATE TABLE events (
  user_id INT,
  action VARCHAR,
  ts TIMESTAMP
) WITH (
  connector = 'kafka',
  topic = 'user_events',
  properties.bootstrap.server = 'broker:9092'
);

Regardless of the method, a Table stores data internally within RisingWave and maintains offsets/checkpoints to ensure fault tolerance and data consistency. Therefore, for any pipeline involving CDC or long-running jobs, a Table is indispensable.

Source: a lightweight ingress definition

Unlike a Table, a Source only defines how to connect to an external system and does not store any data within RisingWave. For Sources like Kafka, S3, and Iceberg, you can execute SELECT queries directly:

CREATE SOURCE kafka_src (
  user_id INT,
  action VARCHAR,
  ts TIMESTAMP
) WITH (
  connector = 'kafka',
  topic = 'user_events',
  properties.bootstrap.server = 'broker:9092'
);

SELECT * FROM kafka_src LIMIT 10;

However, CDC Sources (e.g., Postgres/MySQL CDC) cannot be queried directly. This is because CDC streams contain update and delete operations, which must first be materialized into a Table to correctly apply these changes. The advantage of a Source lies in its lightweight and flexible nature, making it ideal for ad-hoc exploration or as an input for an MV. It’s important to note, however, that a Sink’s upstream cannot be a Source directly.

CDC always requires creating a table

What makes a CDC (Change Data Capture) stream special is that it includes updates and deletes. RisingWave must rely on a Table to process these events correctly. If a user executes UPDATE users SET name='Bob' WHERE id=1 in PostgreSQL, RisingWave needs to know which row to update. Only a Table, which stores the data’s state and has a primary key, can support this semantic. Furthermore, a CDC stream is a continuous log of changes. If it were just a Source, consistency could not be restored after a task interruption. A Table, on the other hand, can resume from the last checkpoint thanks to its persistence and offset tracking. Finally, a single database transaction might update multiple tables. RisingWave relies on Tables to correctly apply transactional boundaries. Therefore, for CDC, you must create a table. You can either create a CDC table directly or create a Source first and then derive a Table from it.

Shared Sources and consistency

For PostgreSQL CDC, MySQL CDC, and SQL Server CDC scenarios, RisingWave provides a shared source mechanism. A user can create a single CDC Source and then derive multiple Tables from it:

CREATE SOURCE pg_src WITH (
  connector = 'postgres-cdc',
  hostname = '127.0.0.1',
  port = '5432',
  username = 'user',
  password = 'pwd',
  database.name = 'mydb'
);

CREATE TABLE users (...) FROM pg_src TABLE 'public.users';
CREATE TABLE orders (...) FROM pg_src TABLE 'public.orders';

The benefits of this approach are reduced configuration duplication and guaranteed cross-table transactional consistency, as all tables consume the same change log stream. If each table established a separate connection, the atomicity of cross-table transactions could be lost. Since v2.1, Kafka Sources also support the shared source feature. Unlike CDC’s shared source, which is used for transactional consistency, Kafka’s shared source primarily aims to avoid redundant data consumption by allowing multiple downstream consumers to share a single SourceExecutor, thereby improving efficiency.

Materialized View

The Materialized View (MV) is the core of real-time computation in RisingWave. It is defined by a SQL query, and the system automatically maintains the result table, refreshing it in real-time as the underlying data updates.

CREATE MATERIALIZED VIEW daily_sales AS
SELECT date_trunc('day', ts) d, SUM(amount) total
FROM sales
GROUP BY d;

MVs provide low-latency queries and allow for the reuse of computation results, making them especially suitable for real-time aggregation, data cleaning, and analysis. In most cases, the upstream of a Sink will be an MV.

Sink

A sink is the egress point of RisingWave, used to write data to external systems. The input for a sink can be:

An existing object such as a source, table, or materialized view, which is defined by the FROM clause in a CREATE SINK statement.
A query, which is defined by CREATE SINK AS SELECT <select_query> syntax.

If you want to write results back into an internal RisingWave table, you can use the CREATE SINK INTO command. A common use case for this is to union multiple sources and write them into a single table.

Summary

Here’s a quick recap of the four core objects in RisingWave:

Table: The storage layer. Mandatory for CDC scenarios. Stores the current state of data.
Source: A lightweight ingress point. Non-CDC Sources can be queried directly, while CDC Sources must be materialized into a Table.
MV: The result table of real-time computation. Recommended as the upstream for a Sink.
Sink: The output layer. Supports three creation syntaxes: FROM, AS SELECT, and INTO.

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

Source, Table, MV, and Sink

High-level comparison

Connector support matrix

Table and Source

Table: the foundation of storage in RisingWave

Source: a lightweight ingress definition

CDC always requires creating a table

Shared Sources and consistency

Materialized View

Sink

Summary

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

​High-level comparison

​Connector support matrix

​Table and Source

​Table: the foundation of storage in RisingWave

​Source: a lightweight ingress definition

​CDC always requires creating a table

​Shared Sources and consistency

​Materialized View

​Sink

​Summary

High-level comparison

Connector support matrix

Table and Source

Table: the foundation of storage in RisingWave

Source: a lightweight ingress definition

CDC always requires creating a table

Shared Sources and consistency

Materialized View

Sink

Summary