Sink data from RisingWave to Delta Lake
This guide describes how to sink data from RisingWave to Delta Lake. Delta Lake is an open-source storage framework designed to allow you to build a lakehouse architecture with another compute engine. For more information, see Delta Lake.
Prerequisites
- Ensure you already have a Delta Lake table that you can sink data to. For additional guidance on creating a table and setting up Delta Lake, refer to this quickstart guide.
- Ensure you have an upstream materialized view or source that you can sink data from.
Spark compatibility
Version 2.4 of Delta Lake is compatible with version 3.4 of Spark.
Versions 2.3, 2.2, and 2.1 of Delta Lake are compatible with version 3.3 of Spark.
Syntax
Parameters
Parameter Names | Description |
---|---|
type | Required. Currently, only append-only is supported. |
location | Required. The file path that the Delta Lake table is reading data from, as specified when creating the Delta Lake table. For AWS, start with s3:// or s3a://;For GCS, start with gs://; For local files, start with file://. |
s3.endpoint | Required. Endpoint of the S3. For MinIO object store backend, it should be http://. For AWS S3, refer to S3. |
s3.access.key | Required. Access key of the S3 compatible object store. |
s3.secret.key | Required. Secret key of the S3 compatible object store. |
gcs.service.account | Required for GCS. Specifies the service account JSON file as a string. |
commit_checkpoint_interval | Optional. Commit every N checkpoints (N > 0). Default value is 10. The behavior of this field also depends on the sink_decouple setting:If sink_decouple is true (the default), the default value of commit_checkpoint_interval is 10. If sink_decouple is set to false, the default value of commit_checkpoint_interval is 1. If sink_decouple is set to false and commit_checkpoint_interval is set to larger than 1, an error will occur. |
Example
Here is a step-by-step example on how you can sink data from RisingWave to Delta Lake.
Create a Delta Lake table
In a spark-sql
shell, create a Delta table. For more information, see the Delta Lake quickstart.
For example, the following spark-sql
command creates a Delta Lake table in AWS S3. The table is in an S3 bucket named my-delta-lake-bucket
in region ap-southeast-1
and under the path path/to/table
. Before running the following command to create a Delta Lake table, create an empty directory path/to/table
. The full URL of the table location is s3://my-delta-lake-bucket/path/to/table
.
Note that only S3-compatible object store is supported, such as AWS S3 or MinIO.
Create an upstream materialized view or source
The following query creates a source using the built-in load generator, which creates mock data. For more details, see CREATE SOURCE and Generate test data. You can transform the data using additional SQL queries if needed.
You can also choose to create an upsert table, which supports in-place updates. For more details on creating a table, see CREATE TABLE.
Create a sink
Append-only sink from append-only source
If you have an append-only
source and want to create an append-only
sink, set type = append-only
in the CREATE SINK
query.
Append-only sink from upsert table
If you have a table or source that is not of type append-only
and want to create an append-only
sink, set type = append-only
and set force_append_only = true
in the CREATE SINK
query.
Query data in Delta Lake
To ensure that data is flushed to the sink, use the FLUSH
command in RisingWave.
The following query checks the total number of records sinked to the Delta Lake table using spark-sql
.
Was this page helpful?