This document explains how to modify the logic in streaming pipelines within RisingWave. Understanding these mechanisms is essential for effectively managing your data processing workflows.
NULL
for existing records.
cust_sales
:
cust_sales_new
with the new column sales_count
:
cust_sales
and rename cust_sales_new
to cust_sales
:
CREATE SINK ... FROM ...
statement, you have the option to specify without_backfill = true
to exclude existing data.
orders
is not a table but another materialized view, derived from tables order_items
and price
.
sales_count
to cust_sales
, we need to create the new materialized views cust_sales_new
and orders_new
first:
cust_sales
and rename cust_sales_new
to cust_sales
:
SINK INTO TABLE
supports modifying the streaming job logic in place. This approach decouples the streaming processing logic from the persistence layer (the materialization step).
Why not always use SINK INTO TABLE
? There are performance implications to consider. Materialized views can trust their upstreams to issue consistent records (for example, no duplicate inserts or deletes for the same primary key). However, tables can have multiple sinks writing records to them, which may cause conflicts. These conflicts require resolution, as specified by the ON CONFLICT OVERWRITE
clause.
This trade-off allows for dynamic changes to SINK INTO TABLE
, but comes with some performance overhead for conflict resolution.
Create the initial objects
t
:Convert to `SINK INTO TABLE`
varchar, int
to varchar, int, int
. Since this is a materialized view with potential downstream dependencies, we cannot simply drop and recreate it. Instead, let’s convert it to use SINK INTO TABLE
:Add column
DELETE
statement when some keys do not exist in the new version of the streaming job.ON CONFLICT OVERWRITE
clause instead, but you will need to first drop the old sink before creating the new one.adult_users
that tracks the number of users aged ≥ 18.
age >= 18
to age >= 16
as a straightforward solution. However, this is not feasible in stream processing since records with ages between 16 and 18 have already been filtered out. Therefore, the only option to restore the missing data is to recompute the entire stream from the beginning.
Therefore, we recommend persistently storing the source data in a long-term storage solution, such as a RisingWave table. This allows for the recomputation of the materialized view when altering the logic becomes necessary.