Benefits of using indexes
In general, using indexes can significantly enhance the performance and efficiency of the system.When to use indexes
Indexes can be particularly useful for optimizing the performance of queries that retrieve a small number of records from a large dataset. In RisingWave, indexes can speed up batch queries.How to use indexes
You can use the CREATE INDEX command to construct an index on a table or a materialized view. The syntax is as follows:-
All columns from the materialized view that are referenced in the
SELECTclause should be included in theINCLUDEclause when creating the index. -
Any columns used in the
WHEREclause of batch queries should be specified in theindex_columnsection of the index. For example, if a batch query includes a filter such astimestamp BETWEEN t1 AND t2, the timestamp column should be part of theindex_column. The same principle applies to any other filter conditions such as equality and inequalities.
customers and orders.
c_phone column in the customers table.
o_custkey column in the orders table.
How to decide which columns to include?
By default, RisingWave creates an index that includes all columns of a table or a materialized view if you omit theINCLUDE clause. This differs from the standard PostgreSQL. This is because RisingWave’s design as a cloud-native streaming database includes several key differences from PostgreSQL, including the use of an object store for more cost-effective storage, and the desire to make index creation as simple as possible for users who are not experienced with database systems.
By including all columns, RisingWave ensures that an index will cover all of the columns touched by a query and eliminates the need for a primary table lookup, which can be slower in a cloud environment due to network communication. However, RisingWave still provides the option to include only specific columns using the INCLUDE clause for users who wish to do so.
For example:
If your queries only access certain columns, you can create an index that includes only those columns. The RisingWave optimizer will automatically select the appropriate index for your query.
You can use the EXPLAIN command to view the execution plan.
How to decide the index distribution key?
RisingWave will use the first index column as thedistributed_column by default if you omit the DISTRIBUTED BY clause. RisingWave distributes the data across multiple nodes and uses the distributed_column to determine how to distribute the data based on the index. If your queries intend to use indexes but only provide the prefix of the index_column, it could be a problem for RisingWave to determine which node to access the index data from. To address this issue, you can specify the distributed_column yourself, ensuring that these columns are the prefixes of the index_column.
For example: