Skip to main content

Sink data from RisingWave to Google BigQuery

This guide describes how to sink data from RisingWave to Google BigQuery.

BigQuery is Google's fully managed data warehouse and data analytics platform, capable of handling and analyzing large volumes of data as it is highly scalable.

You can test out this process on your own device by using the big-query-sink demo in the integration_test directory of the RisingWave repository.

Premium Edition Feature

This feature is only available in the premium edition of RisingWave. The premium edition offers additional advanced features and capabilities beyond the free and community editions. If you have any questions about upgrading to the premium edition, please contact our sales team at sales@risingwave-labs.com.

Public Preview

This feature is in the public preview stage, meaning it's nearing the final product but is not yet fully stable. If you encounter any issues or have feedback, please contact us through our Slack channel. Your input is valuable in helping us improve the feature. For more information, see our Public preview feature list.

Prerequisites

Before sinking data from RisingWave to BigQuery, please ensure the following:

  • The BigQuery table you want to sink to is accessible from RisingWave.
  • Ensure you have an upstream materialized view or table in RisingWave that you can sink data from.

Syntax

CREATE SINK [ IF NOT EXISTS ] sink_name
[FROM sink_from | AS select_query]
WITH (
connector='bigquery',
connector_parameter = 'value', ...
);

Parameters

Parameter NamesDescription
sink_nameName of the sink to be created.
sink_fromA clause that specifies the direct source from which data will be output. sink_from can be a materialized view or a table. Either this clause or select_query query must be specified.
AS select_queryA SELECT query that specifies the data to be output to the sink. Either this query or a sink_from clause must be specified. See SELECT for the syntax and examples of the SELECT command.
typeRequired. Data format. Allowed formats:
  • append-only: Output data with insert operations.
  • upsert: For this type, you need to set corresponding permissions and primary keys based on the Document of BigQuery.
force_append_onlyOptional. If true, forces the sink to be append-only, even if it cannot be.
bigquery.local.pathOptional. The file path leading to the JSON key file located in your local server. Details can be found in Service Accounts under your Google Cloud account. Either bigquery.local.path or bigquery.s3.path must be specified.
bigquery.s3.pathOptional. The file path leading to the JSON key file located in S3. Details can be found in Service Accounts under your Google Cloud account. At least one of bigquery.local.path or bigquery.s3.path must be specified.
bigquery.projectRequired. The BigQuery project ID.
bigquery.datasetRequired. The BigQuery dataset ID.
bigquery.tableRequired. The BigQuery table you want to sink to.
bigquery.retry_timesOptional. The number of times the system should retry a BigQuery insert operation before ultimately returning an error. Defaults to 5.
auto_createOptional. Defaults to false. If true, a new table will be automatically created in BigQuery when the specified table is not found.
aws.credentials.access_key_idOptional. The access key of the S3 file. This must be specified if sinking to an S3 file.
aws.credentials.secret_access_keyOptional. The secret access key of the S3 file. This must be specified if sinking to an S3 file.
regionOptional. The service region of the S3 file. This must be specified if sinking to an S3 file.

Examples

We can create a BigQuery sink with a local JSON key file.

CREATE SINK big_query_sink_local
FROM mv1
WITH (
connector = 'bigquery',
type = 'append-only',
bigquery.local.path= '${bigquery_service_account_json_path}',
bigquery.project= '${project_id}',
bigquery.dataset= '${dataset_id}',
bigquery.table= '${table_id}',
force_append_only='true'
);

Or we can create a BigQuery sink with an S3 JSON key file.

CREATE SINK big_query_sink_s3
FROM mv1
WITH (
connector = 'bigquery',
type = 'append-only',
bigquery.s3.path= '${s3_service_account_json_path}',
bigquery.project= '${project_id}',
bigquery.dataset= '${dataset_id}',
bigquery.table= '${table_id}',
aws.credentials.access_key_id = '${aws_access_key}',
aws.credentials.secret_access_key = '${aws_secret_access}',
region = '${aws_region}',
force_append_only='true',
);

Data type mapping

RisingWave Data TypeBigQuery Data Type
booleanbool
smallintint64
integerint64
bigintint64
realunsupported
double precisionfloat64
numericnumeric
datedate
character varying (varchar)string
time without time zonetime
timestamp without time zonedatetime
timestamp with time zonetimestamp
intervalinterval
structstruct
arrayarray
byteabytes
JSONBJSON
serialint64

Help us make this doc better!