Learn best practices for maximizing the performance of your RisingWave deployments.
append only
on a table?CREATE MATERIALIZED VIEW max_account_balance AS SELECT max(balance) FROM account_table
. Since the account balance of each client is frequently changing, we cannot declare it as a source or an append-only table. Then RisingWave has to persist the balances from all the accounts to maintain the result, such as finding how much the second largest balance is when the first largest balance decreases.
Suppose we have another materialized view that tracks the time of the latest transaction among all the transactions, e.g., CREATE MATERIALIZED VIEW time_latest_transaction AS SELECT max(timestamp) FROM transactions
. Since the transaction is irreversible (even if one transaction is a mistake, we correct it with a new transaction), it is perfect for us to declare transactions as a source or an append-only table. Then RisingWave only needs to keep a single data point, i.e., the timestamp of the latest transaction, and simply compare it with the timestamp of a new transaction to update the result.
This append-only versus non-append-only difference can make an impact in a few use cases:
ORDER BY
clause and are only interested in the top N rows.t
using create table t(v1 int, v2 varchar, v3 timestamp);
. This table contains a large amount of historical data. When creating a new MV on top of this table, it requires backfilling all the historical data, which can be a time-consuming process. Then we create an MV called m
using create materialized view m as select v1, sum(v2) from t group by v1;
. The challenge during backfilling is that when reading data from table t
to build the MV m
, the order of the v1
values may be random. This randomness can lead to poor cache locality and trigger remote I/O, ultimately decreasing performance.
However, if we use create table t(v1 int primary key, v2 varchar);
instead to create a table, or define an intermediate mv like create materialized view tmp as select * from t order by v1;
. The storage of RisingWave will order rows by v1
, which means that rows with same v1
will be read consecutively. This adjustment enhances cache locality and thereby improves performance.
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY updated_at DESC)
, writing sink data ordered by order_id
, updated_at
helps store related rows consecutively. This improves data locality and allows the MV to backfill more efficiently by reducing random I/O and increasing cache hit rates.