Ingest data from Azure Blob

Use the SQL statement below to connect RisingWave to Azure Blob Storage using Azblob connector. Note that the Azblob connector does not guarantee the sequential reading of files or complete file reading.

Syntax

CREATE SOURCE [ IF NOT EXISTS ] source_name
schema_definition
[INCLUDE { file | offset | payload } [AS <column_name>]]
WITH (
   connector = 'azblob',
   connector_parameter = 'value', ...
)
FORMAT data_format ENCODE data_encode (
   without_header = 'true' | 'false',
   delimiter = 'delimiter'
);

schema_definition:

(
   column_name data_type [ PRIMARY KEY ], ...
   [ PRIMARY KEY ( column_name, ... ) ]
)

Connector parameters

Field	Notes
azblob.container_name	Required. The name of the container the data source is stored in.
azblob.credentials.account_name	Optional. The name of the Azure Blob Storage account.
azblob.credentials.account_key	Optional. The account key for the Azure Blob Storage account.
azblob.endpoint_url	Required. The URL of the Azure Blob Storage service endpoint.
match_pattern	Conditional. Set to find object keys in `azblob.container_name` that match the given pattern. Standard Unix-style glob syntax is supported. A typical usage follows the `prefix/.suffix` pattern. For example, `your_directory/.parquet` matches all Parquet files under `your_directory/`. If `match_pattern` does not contain `/`, the scan runs from the container root.
compression_format	Optional. Specifies the compression format of the file being read. When set to gzip or gz, the file reader reads all files with the `.gz` suffix; when set to `None` or not defined, the file reader will automatically read and decompress `.gz` and `.gzip` files.

Other parameters

Field	Notes
data_format	Supported data format: PLAIN.
data_encode	Supported data encodes: CSV, JSON, PARQUET.
without_header	This field is only for CSV encode, and it indicates whether the first line is header. Accepted values: `true`, `false`. Default is `true`.
delimiter	How RisingWave splits contents. For JSON encode, the delimiter is `\n`; for CSV encode, the delimiter can be one of `,`, `;`, `E'\t'`.

Additional columns

Field	Notes
file	Optional. The column contains the file name where current record comes from.
offset	Optional. The column contains the corresponding bytes offset (record offset for parquet files) where current message begins

Read Parquet files from Azure Blob

Added in v2.3.0.

You can use the table function file_scan() to read Parquet files from Azure Blob, either a single file or a directory of Parquet files.

Function signature

file_scan (parquet, azblob, account_name, account_key, endpoint, file_location)

When reading a directory of Parquet files, the schema will be based on the first Parquet file listed. Please ensure that all Parquet files in the directory have the same schema.

Examples

Here are examples of connecting RisingWave to the Azblob source to read data from individual streams.

CSV
JSON
PARQUET

CREATE SOURCE s(
    id int,
    name varchar,
    age int
)
WITH (
    connector = 'azblob',
    azblob.container_name = 'xxx',
    azblob.credentials.account_name = 'xxx',
    azblob.credentials.account_key = 'xxx',
    azblob.endpoint_url = 'xxx',
) FORMAT PLAIN ENCODE CSV (
    without_header = 'true',
    delimiter = ',' -- set delimiter = E'\t' for tab-separated files
);

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

Ingest data from Azure Blob

Syntax

Connector parameters

Other parameters

Additional columns

Read Parquet files from Azure Blob

Examples

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

​Syntax

​Connector parameters

​Other parameters

​Additional columns

​Read Parquet files from Azure Blob

​Examples

Syntax

Connector parameters

Other parameters

Additional columns

Read Parquet files from Azure Blob

Examples