Need help generating SQL? Use Claude Code or Cursor with the RisingWave MCP server to generate and run SQL interactively. RisingWave offers several methods for data ingestion, each tailored to different use cases. This guide covers the main patterns to help you choose the best approach for your needs. For a detailed comparison of core objects like Source, Table, Materialized View, and Sink, see our guide on Source, Table, MV, and Sink.Documentation Index
Fetch the complete documentation index at: https://docs.risingwave.com/llms.txt
Use this file to discover all available pages before exploring further.
Supported sources
Below is a complete list of source connectors in RisingWave. Click a connector name to see the SQL syntax, options, and a sample statement for connecting RisingWave to the connector.| Connector | Version |
|---|---|
| Kafka | 3.1.0 or later |
| Redpanda | Latest |
| Pulsar | 2.8.0 or later |
| Kinesis | Latest |
| PostgreSQL CDC | 10, 11, 12, 13, 14 |
| MySQL CDC | 5.7, 8.0 |
| SQL Server CDC | 2019, 2022 |
| MongoDB CDC | |
| CDC via Kafka | |
| Google Pub/Sub | |
| Amazon S3 | Latest |
| Google Cloud Storage | Latest |
| Azure Blob | Latest |
| NATS JetStream | |
| MQTT | |
| Webhook | Built-in |
| Events API (HTTP) | External service |
| Apache Iceberg | |
| Snowflake | Latest |
| Load generator (datagen) | Built-in |
CREATE SOURCE or CREATE TABLE with each format, see Data formats and encoding options.
Continuous streaming ingestion
What it is: Real-time, continuous data ingestion from streaming sources that automatically updates as new data arrives. When to use: For real-time analytics, event-driven applications, live dashboards, and when you need immediate data freshness.Option: HTTP ingestion (Webhook / Events API)
If you want to ingest events over HTTP without introducing Kafka or another message broker, you can use one of these options:- Webhook connector: RisingWave serves as the webhook destination and ingests requests into
connector = 'webhook'tables (supports provider-style request validation/signatures). See Ingest data from webhook. - Events API: Run a standalone service that ingests JSON/NDJSON over HTTP and can execute SQL over HTTP. See Events API.
Example: Kafka streaming ingestion
Example: Database CDC (Change Data Capture)
Example: Message queues (MQTT, NATS, Pulsar)
One-time batch ingestion
What it is: Loading data once from external sources like databases, data lakes, or files. When to use: For initial data loads, historical data import, or when you need to load static datasets.Example: Batch load from a database
Thepostgres-cdc connector can be used to perform a one-time snapshot of a PostgreSQL table. For other databases, such as MySQL, you can use the corresponding CDC connector and set snapshot.mode to initial_only.
Example: Load from cloud storage (S3, GCS, Azure)
Example: Load from a data lake (Iceberg)
Periodic ingestion with external orchestration
What it is: RisingWave doesn’t have a built-in scheduler, but you can achieve periodic ingestion using external orchestration tools like Cron or Airflow. When to use: For scheduled data updates, daily/hourly batch processing, or when you need precise control over ingestion timing.Example: Incremental loading pattern
A common pattern is to use a control table to track the last load time and only ingest new data.Other ingestion methods
Direct data insertion
You can always insert data directly into a standard table using theINSERT statement.
Test data generation
For development and testing, you can use the built-indatagen connector to generate mock data streams.
Ingestion method support matrix
| Data Source | Continuous Streaming | One-Time Batch | Periodic | Notes |
|---|---|---|---|---|
| Apache Kafka | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| Redpanda | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| Apache Pulsar | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| AWS Kinesis | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| Google Pub/Sub | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| NATS JetStream | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| MQTT | ✅ | ❌ | ⚠️ | Streaming only; periodic via external tools |
| PostgreSQL CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| MySQL CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| SQL Server CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| MongoDB CDC | ✅ | ✅ | ⚠️ | CDC for streaming; direct connection for batch |
| AWS S3 | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Google Cloud Storage | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Azure Blob | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Apache Iceberg | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| Snowflake | ❌ | ✅ | ✅ | Manual one-time load via REFRESH TABLE; periodic refresh when refresh_interval_sec is set |
| Datagen | ✅ | ❌ | ❌ | Test data generation only |
| Direct INSERT | ❌ | ✅ | ⚠️ | Manual insertion; periodic via external tools |
| Webhook | ✅ | ✅ | ⚠️ | Push-based HTTP ingestion; best for SaaS webhooks + request validation/signatures |
| Events API (HTTP) | ✅ | ✅ | ⚠️ | Run Events API service; supports NDJSON ingestion and SQL over HTTP |
- ✅ Natively Supported: Built-in support for this ingestion method.
- ❌ Not Supported: This ingestion method is not available for this source.
- ⚠️ External Tools Required: Requires external orchestration tools (e.g., Cron, Airflow).
Best practices
Choose the right method
- Streaming: Use for real-time requirements and continuous data flows.
- Batch: Use for historical data, large one-time loads, or static datasets.
- Periodic: Use for scheduled updates with external orchestration tools.
Performance considerations
- Streaming ingestion offers the best real-time performance.
- Batch loading is efficient for large datasets.
- Use materialized views to pre-compute and store results for fast querying.
Data consistency
- CDC provides high-fidelity replication of database changes.
- For message queues, understand the delivery guarantees (e.g., at-least-once) of your system.
- Use transactions for atomic operations when inserting data manually.
- Monitor data quality and set up alerts.
Monitoring and operations
- Monitor streaming lag for real-time sources to ensure data freshness.
- Track batch job success and failure rates.
- Set up alerts for data quality issues.
- Use RisingWave’s system tables and dashboards for monitoring.
See also
- What is a Source? — Source concepts, connectors, and source vs. table
- What is CDC? — Native CDC connectors for PostgreSQL, MySQL, SQL Server, MongoDB
- Data processing — Transform ingested data with materialized views
- Data delivery — Send processed results to downstream systems
- Source, Table, MV, and Sink — Core object comparison