Indexes

Indexes are usually created on one or more columns in a table and can greatly improve the performance of queries, particularly for tables that are large in size or accessed frequently. Indexes are usually created on non-primary columns to speed up point-select queries on those columns.

Benefits of using indexes

In general, using indexes can significantly enhance the performance and efficiency of the system.

When to use indexes

Indexes can be particularly useful for optimizing the performance of queries that retrieve a small number of records from a large dataset. In RisingWave, indexes can speed up batch queries.

How to use indexes

You can use the CREATE INDEX command to construct an index on a table or a materialized view. The syntax is as follows:

CREATE INDEX [IF NOT EXISTS] index_name ON object_name ( index_column [ ASC | DESC ], [, ...] )
[ INCLUDE ( include_column [, ...] ) ]
[ DISTRIBUTED BY ( distributed_column [, ...] ) ];

In short,

All columns from the materialized view that are referenced in the SELECT clause should be included in the INCLUDE clause when creating the index.
Any columns used in the WHERE clause of batch queries should be specified in the index_column section of the index. For example, if a batch query includes a filter such as timestamp BETWEEN t1 AND t2, the timestamp column should be part of the index_column. The same principle applies to any other filter conditions such as equality and inequalities.

Here is a simple example. Let’s create two tables, customers and orders.

CREATE TABLE customers (
    c_custkey INTEGER,
    c_name VARCHAR,
    c_address VARCHAR,
    c_nationkey INTEGER,
    c_phone VARCHAR,
    c_acctbal NUMERIC,
    c_mktsegment VARCHAR,
    c_comment VARCHAR,
    PRIMARY KEY (c_custkey)
);

CREATE TABLE orders (
    o_orderkey BIGINT,
    o_custkey INTEGER,
    o_orderstatus VARCHAR,
    o_totalprice NUMERIC,
    o_orderdate DATE,
    o_orderpriority VARCHAR,
    o_clerk VARCHAR,
    o_shippriority INTEGER,
    o_comment VARCHAR,
    PRIMARY KEY (o_orderkey)
);

If you want to speed up the query of fetching a customer record by the phone number, you can build an index on the c_phone column in the customers table.

CREATE INDEX idx_c_phone on customers(c_phone);

SELECT * FROM customers where c_phone = '123456789';

SELECT * FROM customers where c_phone in ('123456789', '987654321');

If you want to speed up the query of fetching all the orders of a customer by the customer key, you can build an index on the o_custkey column in the orders table.

CREATE INDEX idx_o_custkey ON orders(o_custkey);

SELECT * FROM customers JOIN orders ON c_custkey = o_custkey
WHERE c_phone = '123456789';

How to decide which columns to include?

By default, RisingWave creates an index that includes all columns of a table or a materialized view if you omit the INCLUDE clause. This differs from the standard PostgreSQL. This is because RisingWave’s design as a cloud-native streaming database includes several key differences from PostgreSQL, including the use of an object store for more cost-effective storage, and the desire to make index creation as simple as possible for users who are not experienced with database systems. By including all columns, RisingWave ensures that an index will cover all of the columns touched by a query and eliminates the need for a primary table lookup, which can be slower in a cloud environment due to network communication. However, RisingWave still provides the option to include only specific columns using the INCLUDE clause for users who wish to do so. For example: If your queries only access certain columns, you can create an index that includes only those columns. The RisingWave optimizer will automatically select the appropriate index for your query.

-- Create an index that only includes necessary columns
CREATE INDEX idx_c_phone1 ON customers(c_phone) INCLUDE (c_name, c_address);

-- RisingWave will automatically use index idx_c_phone1 for the following query since it only access the indexed columns.
SELECT c_name, c_address FROM customers WHERE c_phone = '123456789';

You can use the EXPLAIN command to view the execution plan.

How to decide the index distribution key?

RisingWave will use the first index column as the distributed_column by default if you omit the DISTRIBUTED BY clause. RisingWave distributes the data across multiple nodes and uses the distributed_column to determine how to distribute the data based on the index. If your queries intend to use indexes but only provide the prefix of the index_column, it could be a problem for RisingWave to determine which node to access the index data from. To address this issue, you can specify the distributed_column yourself, ensuring that these columns are the prefixes of the index_column. For example:

-- Create an index with specified distributed columns
CREATE INDEX idx_c_phone2 ON customers(c_name, c_nationkey) DISTRIBUTED BY (c_name);

SELECT * FROM customers WHERE c_name = 'Alice';

Indexes on expressions

RisingWave supports creating indexes on expressions. Indexes on expressions are normally used to improve the performance of queries for frequently used expressions. To create an index on an expression, use the syntax:

CREATE INDEX index_name ON object_name (expression(column_name));

For example, if you often perform queries like this:

SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';

Then you might want to create an index like the following to improve the performance of such queries:

CREATE INDEX people_names ON people ((first_name || ' ' || last_name));

This syntax is quite useful when working with a semi-structured table that utilizes the JSONB datatype. Here is an example of creating an index on a specific field within a JSONB column.

dev=> create table t(v jsonb);
CREATE_TABLE
dev=> create index i on t((v -> 'field')::int);
CREATE_INDEX
dev=> explain select * from t where (v->'field')::int = 123;
                                QUERY PLAN
--------------------------------------------------------------------------
 BatchExchange { order: [], dist: Single }
 └─BatchScan { table: i, columns: [v], scan_ranges: [CAST = Int32(123)] }
(2 rows)

More examples

CREATE TABLE t(k1 INT, k2 INT, v1 INT, v2 INT, PRIMARY KEY(k1,k2));

--- idx_t_k2_partial_columns doesn't include column v2.
CREATE INDEX idx_t_k2_partial_columns ON t(k2) INCLUDE (k1,v1);

--- idx_t_k2_partial_columns is not utilized.
EXPLAIN SELECT v2 FROM t WHERE k1=1;
 BatchExchange { order: [], dist: Single }
 └─BatchScan { table: t, columns: [v2], scan_ranges: [k1 = Int32(1)] }

--- idx_t_k2_partial_columns is utilized.
EXPLAIN SELECT v1 FROM t WHERE k2=1;
 BatchExchange { order: [], dist: Single }
 └─BatchScan { table: idx_t_k2_partial_columns, columns: [v1], scan_ranges: [k2 = Int32(1)] }

--- idx_t_k2_partial_columns is utilized. However since it doesn't include column v2, the plan requires an additional lookup join.
EXPLAIN SELECT v2 FROM t WHERE k2=1;
 BatchLookupJoin { type: Inner, predicate: idx_t_k2_partial_columns.k1 IS NOT DISTINCT FROM t.k1 AND idx_t_k2_partial_columns.k2 IS NOT DISTINCT FROM t.k2 AND (t.k2 = 1:Int32), lookup table: t }
 └─BatchExchange { order: [], dist: Single }
   └─BatchScan { table: idx_t_k2_partial_columns, columns: [k2, k1], scan_ranges: [k2 = Int32(1)] }

--- idx_t_k2_all_columns includes all columns.
CREATE INDEX idx_t_k2_all_columns ON t(k2);

--- idx_t_k2_all_columns is utilized. Compare the plan with the one that uses idx_t_k2_partial_columns.
EXPLAIN SELECT v2 FROM t WHERE k2=1;
 BatchExchange { order: [], dist: Single }
 └─BatchScan { table: idx_t_k2_all_columns, columns: [v2], scan_ranges: [k2 = Int32(1)] }

CREATE INDEX

Create an index constructed on a table or a materialized view.

DROP INDEX

Remove an index constructed on a table or a materialized view.

SHOW CREATE INDEX

See what query was used to create the specified index.

ALTER INDEX

Modify an index.

Get started

Working with data

Deploy & operate

Performance

Troubleshooting

Reference

Cloud

Benefits of using indexes

When to use indexes

How to use indexes

How to decide which columns to include?

How to decide the index distribution key?

Indexes on expressions

More examples

See also

CREATE INDEX

DROP INDEX

SHOW CREATE INDEX

ALTER INDEX

Get started

Working with data

Deploy & operate

Performance

Troubleshooting

Reference

Cloud

​Benefits of using indexes

​When to use indexes

​How to use indexes

​How to decide which columns to include?

​How to decide the index distribution key?

​Indexes on expressions

​More examples

​See also

CREATE INDEX

DROP INDEX

SHOW CREATE INDEX

ALTER INDEX

Benefits of using indexes

When to use indexes

How to use indexes

How to decide which columns to include?

How to decide the index distribution key?

Indexes on expressions

More examples

See also