What is a source?
A source in RisingWave is a connection to an external data system that streams data into the database. Sources define where data comes from and how it should be parsed — they are the entry point for all data ingestion in RisingWave. When you create a source withCREATE SOURCE, RisingWave establishes a connection to the external system and begins reading data according to the specified format and encoding. Sources do not persist data inside RisingWave — they provide a live window into the external data stream.
How sources work
- Queried directly (for supported connectors like Kafka, S3, Iceberg) — useful for ad-hoc exploration and validation.
- Referenced in materialized views — to build continuous streaming pipelines.
- Used as input for sinks — to transform and deliver data to downstream systems.
Source vs. Table
A common question is when to use a source versus a table with connector settings. The key difference is data persistence:| Source | Table (with connector) | |
|---|---|---|
| Persists data in RisingWave | No | Yes |
| Supports DML (INSERT, UPDATE, DELETE) | No | Yes |
| Required for CDC connectors | No — must use Table | Yes |
| Primary key required | No | Required for CDC |
| Best for | Exploration, stateless pipelines | Persistent storage, CDC, updates |
Supported source connectors
RisingWave supports a wide range of source connectors: Message brokers: Apache Kafka, Apache Pulsar, Amazon Kinesis, MQTT, NATS, Google Pub/Sub Databases (CDC): PostgreSQL, MySQL, SQL Server, MongoDB — via native CDC connectors, no Kafka or Debezium required Object storage: Amazon S3, Google Cloud Storage, Azure Blob Storage Data lakes: Apache Iceberg Other: Webhooks, Datagen (for testing) For the full list, see Supported source connectors.Data formats and encoding
Sources support multiple data formats:- JSON — most common; schema inferred or specified in DDL
- Avro — requires schema registry URL (Kafka sources)
- Protobuf — requires schema specification
- CSV — for file-based sources
- Bytes — raw bytes for custom parsing
- Debezium JSON / Maxwell JSON — for CDC events from Kafka topics
Related topics
- CREATE SOURCE — SQL reference
- Ingestion overview — Connector list and ingestion patterns
- Source, Table, MV, and Sink — Core object comparison
- CDC with RisingWave — Native CDC connectors