A source is a resource that RisingWave can read data from. You can create a source in RisingWave using the CREATE SOURCE
command.
CREATE TABLE
too. For more details about the differences between sources and tables, see here.
Regardless of whether the data is persisted in RisingWave, you can create materialized views to perform analysis or data transformations.
<column_name> timestamptz AS proctime()
) when creating the table or source. See also proctime().
For a source with schema from an external connector, use *
to represent all columns from the external connector first, so that you can define a generated column on the source with an external connector. See the example below.
Parameter | Description |
---|---|
source_name | The name of the source. If a schema name is given (for example, CREATE SOURCE <schema>.<source> …), then the table is created in the specified schema. Otherwise it is created in the current schema. |
col_name | The name of a column. |
data_type | The data type of a column. With the struct data type, you can create a nested table. Elements in a nested table need to be enclosed with angle brackets (<>). |
generation_expression | The expression for the generated column. For details about generated columns, see Generated columns. |
watermark_clause | A clause that defines the watermark for a timestamp column. The syntax is WATERMARK FOR column_name as expr. For details about watermarks, refer to Watermarks. |
INCLUDE clause | Extract fields not included in the payload as separate columns. For more details on its usage, see INCLUDE clause. |
WITH clause | Specify the connector settings here if trying to store all the source data. See Supported sources for the full list of supported source as well as links to specific connector pages detailing the syntax for each source. |
FORMAT and ENCODE options | Specify the data format and the encoding format of the source data. To learn about the supported data formats, see Data formats and encoding options. |
schema_definition
. For more information on how to create a watermark, see Watermarks.
ALTER SOURCE [ADD COLUMN | REFRESH SCHEMA]
for shared source is available since version 2.2.streaming_use_shared_source
to control whether to enable it.
risingwave.toml
configuration file, and set the stream_enable_shared_source
to false
.
CREATE SOURCE
statement:
SourceExecutor
will be created to start the process of data ingestion.SourceExecutor
consumed Kafka resources independently, adding pressure to both the Kafka broker and RisingWave.SourceExecutor
instances could result in different consumption progress, causing temporary inconsistencies when joining materialized views.CREATE SOURCE
statement:
SourceExecutor
immediately.SourceExecutor
.Kafka Consumer Lag Size
in the Grafana dashboard (under Streaming
).
CREATE TABLE
statement can provide similar benefits to shared sources, except that it needs to persist all consumed data.
For table with connector, downstream materialized views backfill historical data from the table instead of external sources, which may be more efficient and cause less pressure to the external system. This also gives table stronger consistency guarantee, as historical data will be ensured to be present.
Tables offer other features that enhance their utility in data ingestion workflows. See Table with connectors.