Supported sources
Below is a complete list of source connectors in RisingWave. Click a connector name to see the SQL syntax, options, and a sample statement for connecting RisingWave to the connector.| Connector | Version |
|---|---|
| Kafka | 3.1.0 or later |
| Redpanda | Latest |
| Pulsar | 2.8.0 or later |
| Kinesis | Latest |
| PostgreSQL CDC | 10, 11, 12, 13, 14 |
| MySQL CDC | 5.7, 8.0 |
| SQL Server CDC | 2019, 2022 |
| MongoDB CDC | |
| CDC via Kafka | |
| Google Pub/Sub | |
| Amazon S3 | Latest |
| Google Cloud Storage | Latest |
| Azure Blob | Latest |
| NATS JetStream | |
| MQTT | |
| Webhook | Built-in |
| Events API (HTTP) | External service |
| Apache Iceberg | |
| Load generator (datagen) | Built-in |
CREATE SOURCE or CREATE TABLE with each format, see Data formats and encoding options.
Continuous streaming ingestion
What it is: Real-time, continuous data ingestion from streaming sources that automatically updates as new data arrives. When to use: For real-time analytics, event-driven applications, live dashboards, and when you need immediate data freshness.Option: HTTP ingestion (Webhook / Events API)
If you want to ingest events over HTTP without introducing Kafka or another message broker, you can use one of these options:- Webhook connector: RisingWave serves as the webhook destination and ingests requests into
connector = 'webhook'tables (supports provider-style request validation/signatures). See Ingest data from webhook. - Events API: Run a standalone service that ingests JSON/NDJSON over HTTP and can execute SQL over HTTP. See Events API.
Example: Kafka streaming ingestion
Example: Database CDC (Change Data Capture)
Example: Message queues (MQTT, NATS, Pulsar)
One-time batch ingestion
What it is: Loading data once from external sources like databases, data lakes, or files. When to use: For initial data loads, historical data import, or when you need to load static datasets.Example: Batch load from a database
Thepostgres-cdc connector can be used to perform a one-time snapshot of a PostgreSQL table. For other databases, such as MySQL, you can use the corresponding CDC connector and set snapshot.mode to initial_only.
Example: Load from cloud storage (S3, GCS, Azure)
Example: Load from a data lake (Iceberg)
Periodic ingestion with external orchestration
What it is: RisingWave doesn’t have a built-in scheduler, but you can achieve periodic ingestion using external orchestration tools like Cron or Airflow. When to use: For scheduled data updates, daily/hourly batch processing, or when you need precise control over ingestion timing.Example: Incremental loading pattern
A common pattern is to use a control table to track the last load time and only ingest new data.Other ingestion methods
Direct data insertion
You can always insert data directly into a standard table using theINSERT statement.
Test data generation
For development and testing, you can use the built-indatagen connector to generate mock data streams.
Ingestion method support matrix
| Data Source | Continuous Streaming | One-Time Batch | Periodic | Notes |
|---|---|---|---|---|
| Apache Kafka | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| Redpanda | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| Apache Pulsar | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| AWS Kinesis | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| Google Pub/Sub | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| NATS JetStream | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| MQTT | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| PostgreSQL CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| MySQL CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| SQL Server CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| MongoDB CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| AWS S3 | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Google Cloud Storage | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Azure Blob | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Apache Iceberg | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Datagen | ✅ | ❌ | ❌ | Test data generation only |
| Direct INSERT | ❌ | ✅ | ⚠️ | Manual insertion; periodic via external tools |
| Webhook | ✅ | ✅ | ⚠️ | Push-based HTTP ingestion; best for SaaS webhooks + request validation/signatures |
| Events API (HTTP) | ✅ | ✅ | ⚠️ | Run Events API service; supports NDJSON ingestion and SQL over HTTP |
- ✅ Natively Supported: Built-in support for this ingestion method.
- ❌ Not Supported: This ingestion method is not available for this source.
- ⚠️ External Tools Required: Requires external orchestration tools (e.g., Cron, Airflow).
Best practices
Choose the right method
- Streaming: Use for real-time requirements and continuous data flows.
- Batch: Use for historical data, large one-time loads, or static datasets.
- Periodic: Use for scheduled updates with external orchestration tools.
Performance considerations
- Streaming ingestion offers the best real-time performance.
- Batch loading is efficient for large datasets.
- Use materialized views to pre-compute and store results for fast querying.
Data consistency
- CDC provides high-fidelity replication of database changes.
- For message queues, understand the delivery guarantees (e.g., at-least-once) of your system.
- Use transactions for atomic operations when inserting data manually.
- Monitor data quality and set up alerts.
Monitoring and operations
- Monitor streaming lag for real-time sources to ensure data freshness.
- Track batch job success and failure rates.
- Set up alerts for data quality issues.
- Use RisingWave’s system tables and dashboards for monitoring.