Overview of data ingestion in RisingWave

RisingWave lets you connect to various external data sources, ingest data for real-time processing, and make the results available to applications. This overview explains the core concepts and primary methods for working with data in RisingWave.

Key concepts

Source: A connection to an external data source (for example, Kafka, PostgreSQL, or S3). Creating a source, by itself, does not store data within RisingWave.
Table: Represents data stored within RisingWave’s internal storage. There are two main types:
- Standard (batch) tables: Created using a regular CREATE TABLE statement without a WITH clause. You populate these tables manually (for example, using INSERT statements).
  CREATE TABLE my_batch_table ( id INT PRIMARY KEY, name VARCHAR );
- Streaming (connector-backed) tables: Created using a CREATE TABLE ... WITH (connector=...) statement. These tables are automatically and continuously populated by ingesting data from an external source.
  CREATE TABLE my_streaming_table ( id INT PRIMARY KEY, value VARCHAR ) WITH ( connector='kafka', topic='my_topic', properties.bootstrap.server='localhost:9092' ) FORMAT PLAIN ENCODE JSON;
Connector: A built-in component that enables RisingWave to interface with a specific type of data source or sink (for example, Kafka or PostgreSQL).
Materialized view: A continuously updated view of data. Unlike a regular (non-materialized) view, which is just a stored query, a materialized view in RisingWave stores the results of the query and automatically updates those results as new data arrives from the underlying source(s) or table(s). Materialized views are streaming jobs.
Sink: A streaming job that continuously delivers data from RisingWave to an external system (for example, Kafka, PostgreSQL, or cloud storage). Creating a sink is optional. Applications can also directly query data within RisingWave.

Primary ingestion methods

RisingWave provides two primary commands for connecting to external data sources:

CREATE SOURCE (data not stored):
- Description: Establishes a connection to a data source without storing the data permanently in RisingWave.
- Use cases:
  - Quickly inspecting data from a source.
  - Building streaming pipelines using materialized views and/or sinks.
  - Running one-time queries against the source.
- Data availability: Data is typically available only while your RisingWave session is active (the exact behavior depends on the specific source connector).
- Learn more: Connecting with CREATE SOURCE
CREATE TABLE ... WITH (connector=...) (data stored, continuous ingestion):
- Description: Creates a streaming (connector-backed) table. This establishes a connection to a data source and continuously ingests the data into RisingWave’s internal storage.
- Use cases: Historical analysis, improved query performance, primary key support, CDC handling.
- Data availability: Data is stored and remains available.
- Learn More: Connecting with CREATE TABLE

Choosing a method: Use CREATE SOURCE for temporary connections, quick exploration, or to build streaming pipelines with materialized views and/or sinks. Use CREATE TABLE ... WITH (connector=...) for continuous ingestion when you need to store the full raw data for historical analysis or other benefits.

Typical data flow in RisingWave

A common pattern in RisingWave is to:

Ingest data from external sources using CREATE SOURCE or CREATE TABLE ... WITH (connector=...).
(Optional) Create materialized views to transform the data. Materialized views are streaming jobs that continuously process incoming data from sources or tables and store the results in RisingWave. This is an efficient way to build streaming pipelines.
(Optional) Create sinks using CREATE SINK to deliver processed data to external systems. Sinks are also streaming jobs that continuously send data to their destination. Applications can also connect directly to RisingWave and query the results of materialized views or tables without a sink.

Copying data from a source to a table

You can create a connection with CREATE SOURCE and then manually copy data into a standard (batch) table using INSERT INTO ... SELECT:

-- Create a connection (data not stored)
CREATE SOURCE my_source ...;

-- Create a standard table
CREATE TABLE my_batch_table (...);

-- Copy data from the source to the table
INSERT INTO my_batch_table SELECT * FROM my_source;

This is useful for one-time data loading or transformations before storing data.

Supported data sources

RisingWave supports a wide range of data sources, including:

Message queues: Apache Kafka, Redpanda, Pulsar
Databases (via CDC): PostgreSQL, MySQL, SQL Server, MongoDB
Cloud storage: Amazon S3, Google Cloud Storage, Azure Blob Storage
Others: NATS JetStream, MQTT, Apache Iceberg

See Supported data sources for a complete list. RisingWave uses connectors to interface with each source.

Supported data formats and encodings

RisingWave supports various data formats and encodings. See Data formats and encoding options for details.

Next steps

Explore the Sources and Data formats and encoding options.
Learn how to connect to specific sources using the individual connector guides (e.g., Connect to Kafka).
Learn more about using CREATE SOURCE and CREATE TABLE.

Get started

Working with data

Deploy & operate

Performance

Troubleshooting

Reference

Cloud

Overview of data ingestion in RisingWave

Key concepts

Primary ingestion methods

Typical data flow in RisingWave

Copying data from a source to a table

Supported data sources

Supported data formats and encodings

Next steps

Get started

Working with data

Deploy & operate

Performance

Troubleshooting

Reference

Cloud

​Key concepts

​Primary ingestion methods

​Typical data flow in RisingWave

​Copying data from a source to a table

​Supported data sources

​Supported data formats and encodings

​Next steps

Key concepts

Primary ingestion methods

Typical data flow in RisingWave

Copying data from a source to a table

Supported data sources

Supported data formats and encodings

Next steps