RisingWave supports delivering data to downstream systems via its sink connectors.
Sink Connector | Connector Parameter |
---|---|
Apache Doris | connector = 'doris' |
Apache Iceberg | connector = 'iceberg' |
AWS Kinesis | connector = 'kinesis' |
Cassandra and ScyllaDB | connector = 'cassandra' |
ClickHouse | connector = 'clickhouse' |
CockroachDB | connector = 'jdbc' |
Delta Lake | connector = 'deltalake' |
Elasticsearch | connector = 'elasticsearch' |
Google BigQuery | connector = 'bigquery' |
Google Pub/Sub | connector = 'google_pubsub' |
JDBC: MySQL | PostgreSQL | TiDB | connector = 'jdbc' |
Kafka | connector = 'kafka' |
MQTT | connector = 'mqtt' |
NATS | connector = 'nats' |
Pulsar | connector = 'pulsar' |
Redis | connector = 'redis' |
Snowflake | connector = 'snowflake' |
StarRocks | connector = 'starrocks' |
Microsoft SQL Server | connector = 'sqlserver' |
sink_decouple
session variable can be specified to enable or disable sink decoupling. The default value for the session variable is default
.
To enable sink decoupling for all sinks created in the sessions, set sink_decouple
as true
or enable
.
sink_decouple
as false
or disable
, regardless of the default setting.
rw_sink_decouple
is provided to query whether a created sink has enabled sink decoupling or not.
upsert
, append-only
, and debezium
. To determine which data format is supported by each sink connector, please refer to the detailed guide listed above.
In the upsert
sink, a non-null value updates the last value for the same key or inserts a new value if the key doesn’t exist. A NULL value indicates the deletion of the corresponding key.
When creating an upsert
sink, note whether or not you need to specify the primary key in the following situations.
primary_key
field when creating an upsert JDBC sink.file_scan
API. You can also leverage third-party OLAP query engines to enhance data processing capabilities.
Below is an example to sink data to S3:
append-only
and specify this explicitly after the FORMAT ... ENCODE ...
statement.max_row_count
option in the WITH
clause to configure this behavior.
rollover_seconds
option in the WITH
clause to configure this behavior.
path_partition_prefix
option in the WITH
clause to organize files into subdirectories based on their creation time. The available options are month, day, or hour. If not specified, files will be stored directly in the root directory without any time-based subdirectories.
Regarding file naming rules, currently, files follow the naming pattern /Option<path_partition_prefix>/executor_id + timestamp.suffix
. Timestamp
differentiates files batched by the rollover interval.
The output files look like below: