Learn best practices for maximizing the performance of your RisingWave deployments.
SELECT
statement? These columns should all appear in the INCLUDE
clause when creating the index.WHERE
condition in the batch queries? Suppose the batch query filters a column named timestamp with the condition timestamp between t1 and t2
, then the column timestamp
should be included in the index_column
. The same principle applies to any other filter conditions such as equality and inequalities.t
using create table t(v1 int, v2 varchar, v3 timestamp);
. This table contains a large amount of historical data. When creating a new MV on top of this table, it requires backfilling all the historical data, which can be a time-consuming process. Then we create an MV called m
using create materialized view m as select v1, sum(v2) from t group by v1;
. The challenge during backfilling is that when reading data from table t
to build the MV m
, the order of the v1
values may be random. This randomness can lead to poor cache locality and trigger remote I/O, ultimately decreasing performance.
However, if we use create table t(v1 int primary key, v2 varchar);
instead to create a table, or define an intermediate mv like create materialized view tmp as select * from t order by v1;
. The storage of RisingWave will order rows by v1
, which means that rows with same v1
will be read consecutively. This adjustment enhances cache locality and thereby improves performance.
append only
on a table?CREATE MATERIALIZED VIEW max_account_balance AS SELECT max(balance) FROM account_table
. Since the account balance of each client is frequently changing, we cannot declare it as a source or an append-only table. Then RisingWave has to persist the balances from all the accounts to maintain the result, such as finding how much the second largest balance is when the first largest balance decreases.
Suppose we have another materialized view that tracks the time of the latest transaction among all the transactions, e.g., CREATE MATERIALIZED VIEW time_latest_transaction AS SELECT max(timestamp) FROM transactions
. Since the transaction is irreversible (even if one transaction is a mistake, we correct it with a new transaction), it is perfect for us to declare transactions as a source or an append-only table. Then RisingWave only needs to keep a single data point, i.e., the timestamp of the latest transaction, and simply compare it with the timestamp of a new transaction to update the result.
This append-only versus non-append-only difference can make an impact in a few use cases:
ORDER BY
clause and are only interested in the top N rows.