This guide describes how to sink data from RisingWave to Delta Lake. Delta Lake is an open-source storage framework designed to allow you to build a lakehouse architecture with another compute engine. For more information, see Delta Lake.
Parameter Names | Description |
---|---|
type | Required. Currently, only append-only is supported. |
location | Required. The file path that the Delta Lake table is reading data from, as specified when creating the Delta Lake table. For AWS, start with s3:// or s3a:// ;For GCS, start with gs:// ; For local files, start with file:// . |
s3.endpoint | Required. Endpoint of the S3.
|
s3.access.key | Required. Access key of the S3 compatible object store. |
s3.secret.key | Required. Secret key of the S3 compatible object store. |
gcs.service.account | Required for GCS. Specifies the service account JSON file as a string. |
commit_checkpoint_interval | Optional. Commit every N checkpoints (N > 0). Default value is 10. The behavior of this field also depends on the sink_decouple setting:
|
spark-sql
shell, create a Delta table. For more information, see the Delta Lake quickstart.
For example, the following spark-sql
command creates a Delta Lake table in AWS S3. The table is in an S3 bucket named my-delta-lake-bucket
in region ap-southeast-1
and under the path path/to/table
. Before running the following command to create a Delta Lake table, create an empty directory path/to/table
. The full URL of the table location is s3://my-delta-lake-bucket/path/to/table
.
Note that only S3-compatible object store is supported, such as AWS S3 or MinIO.
append-only
source and want to create an append-only
sink, set type = append-only
in the CREATE SINK
query.
append-only
and want to create an append-only
sink, set type = append-only
and set force_append_only = true
in the CREATE SINK
query.
FLUSH
command in RisingWave.
The following query checks the total number of records sinked to the Delta Lake table using spark-sql
.