Use the SQL statement below to connect RisingWave to an Amazon S3 source. RisingWave supports CSV, ndjson and Parquet file formats.
delimiter
option in ENCODE properties
.Field | Notes |
---|---|
connector | Required. Support the s3 connector only. |
s3.region_name | Required. The service region. |
s3.bucket_name | Required. The name of the bucket the data source is stored in. |
s3.credentials.access | Required. This field indicates the access key ID of AWS. |
s3.credentials.secret | Required. This field indicates the secret access key of AWS. |
s3.endpoint_url | Conditional. The host URL for an S3-compatible object storage server. This allows users to use a different server instead of the standard S3 server. |
compression_format | Optional. This field specifies the compression format of the file being read. You can define compression_format in the CREATE TABLE statement. When set to gzip or gz, the file reader reads all files with the .gz suffix. When set to None or not defined, the file reader will automatically read and decompress .gz and .gzip files. |
match_pattern | Conditional. This field is used to find object keys in s3.bucket_name that match the given pattern. Standard Unix-style glob syntax is supported. |
s3.assume_role | Optional. Specifies the ARN of an IAM role to assume when accessing S3. It allows temporary, secure access to S3 resources without sharing long-term credentials. |
refresh.interval.sec | Optional. Configure the time interval between operations of listing files. It determines the delay in discovering new files, with a default value of 60 seconds. |
NULL
.Field | Notes |
---|---|
data_format | Supported data format: PLAIN. |
data_encode | Supported data encodes: CSV, JSON, PARQUET. |
without_header | This field is only for CSV encode, and it indicates whether the first line is header. Accepted values: true , false . Default is true . |
delimiter | How RisingWave splits contents. For JSON encode, the delimiter is \n ; for CSV encode, the delimiter can be one of , , ; , E'\t' . |
Field | Notes |
---|---|
file | Optional. The column contains the file name where current record comes from. |
offset | Optional. The column contains the corresponding bytes offset (record offset for parquet files) where current message begins |
file_scan()
to read Parquet files from S3, either a single file or a directory of Parquet files.
sales_data.parquet
that stores a company’s sales data, containing the following fields:
product_id
: Product IDsales_date
: Sales datequantity
: Sales quantityrevenue
: Sales revenue