Overview of data ingestion in RisingWave
Learn about connecting to data sources and ingesting data into RisingWave.
RisingWave lets you connect to various external data sources, ingest data for real-time processing, and make the results available to applications. This overview explains the core concepts and primary methods for working with data in RisingWave.
Key concepts
- Source: A connection to an external data source (for example, Kafka, PostgreSQL, or S3). Creating a source, by itself, does not store data within RisingWave.
- Table: Represents data stored within RisingWave’s internal storage. There are two main types:
- Standard (batch) tables: Created using a regular
CREATE TABLE
statement without aWITH
clause. You populate these tables manually (for example, usingINSERT
statements). - Streaming (connector-backed) tables: Created using a
CREATE TABLE ... WITH (connector=...)
statement. These tables are automatically and continuously populated by ingesting data from an external source.
- Standard (batch) tables: Created using a regular
- Connector: A built-in component that enables RisingWave to interface with a specific type of data source or sink (for example, Kafka or PostgreSQL).
- Materialized view: A continuously updated view of data. Unlike a regular (non-materialized) view, which is just a stored query, a materialized view in RisingWave stores the results of the query and automatically updates those results as new data arrives from the underlying source(s) or table(s). Materialized views are streaming jobs.
- Sink: A streaming job that continuously delivers data from RisingWave to an external system (for example, Kafka, PostgreSQL, or cloud storage). Creating a sink is optional. Applications can also directly query data within RisingWave.
Primary ingestion methods
RisingWave provides two primary commands for connecting to external data sources:
-
CREATE SOURCE
(data not stored):- Description: Establishes a connection to a data source without storing the data permanently in RisingWave.
- Use cases:
- Quickly inspecting data from a source.
- Building streaming pipelines using materialized views and/or sinks.
- Running one-time queries against the source.
- Data availability: Data is typically available only while your RisingWave session is active (the exact behavior depends on the specific source connector).
- Learn more: Connecting with
CREATE SOURCE
-
CREATE TABLE ... WITH (connector=...)
(data stored, continuous ingestion):- Description: Creates a streaming (connector-backed) table. This establishes a connection to a data source and continuously ingests the data into RisingWave’s internal storage.
- Use cases: Historical analysis, improved query performance, primary key support, CDC handling.
- Data availability: Data is stored and remains available.
- Learn More: Connecting with
CREATE TABLE
Choosing a method: Use CREATE SOURCE
for temporary connections, quick exploration, or to build streaming pipelines with materialized views and/or sinks. Use CREATE TABLE ... WITH (connector=...)
for continuous ingestion when you need to store the full raw data for historical analysis or other benefits.
Typical data flow in RisingWave
A common pattern in RisingWave is to:
- Ingest data from external sources using
CREATE SOURCE
orCREATE TABLE ... WITH (connector=...)
. - (Optional) Create materialized views to transform the data. Materialized views are streaming jobs that continuously process incoming data from sources or tables and store the results in RisingWave. This is an efficient way to build streaming pipelines.
- (Optional) Create sinks using
CREATE SINK
to deliver processed data to external systems. Sinks are also streaming jobs that continuously send data to their destination. Applications can also connect directly to RisingWave and query the results of materialized views or tables without a sink.
Copying data from a source to a table
You can create a connection with CREATE SOURCE
and then manually copy data into a standard (batch) table using INSERT INTO ... SELECT
:
This is useful for one-time data loading or transformations before storing data.
Supported data sources
RisingWave supports a wide range of data sources, including:
- Message queues: Apache Kafka, Redpanda, Pulsar
- Databases (via CDC): PostgreSQL, MySQL, SQL Server, MongoDB
- Cloud storage: Amazon S3, Google Cloud Storage, Azure Blob Storage
- Others: NATS JetStream, MQTT, Apache Iceberg
See Supported data sources for a complete list. RisingWave uses connectors to interface with each source.
Supported data formats and encodings
RisingWave supports various data formats and encodings. See Data formats and encoding options for details.
Next steps
- Explore the Sources and Data formats and encoding options.
- Learn how to connect to specific sources using the individual connector guides (e.g., Connect to Kafka).
- Learn more about using
CREATE SOURCE
andCREATE TABLE
.
Was this page helpful?