Creating vector indexes
Use theCREATE INDEX
command with vector-specific syntax to create vector indexes. For more details, see CREATE INDEX.
Syntax
Index types
Before creating a vector index, you may create a sample tableitem
to reference the table name and column names. Currently, we only support creating vector indexes on append-only inputs, such as append-only tables or materialized views. Therefore, we have to specify the table as append-only here:
-
FLAT index: Provides exact results by comparing the query vector against all stored vectors.
-
HNSW index: Hierarchical Navigable Small World (HNSW) index that provides approximate nearest neighbor search with better performance for large datasets.
ef_search
by setting the session variable batch_hnsw_ef_search
(the default value is 40).
Parameters
Parameter | Description | Valid for |
---|---|---|
distance_type | Distance metric to use: l2 , cosine , l1 , or inner_product | FLAT, HNSW |
m | Optional. Maximum number of connections per node | HNSW |
ef_construction | Optional. Size of dynamic candidate list during construction | HNSW |
max_level | Optional. Maximum level of the HNSW graph | HNSW |
Vector distance operators
RisingWave provides specialized operators for calculating vector distances:Operator | Function | Description |
---|---|---|
<-> | l2_distance() | Euclidean (L2) distance |
<=> | cosine_distance() | Cosine distance |
<+> | l1_distance() | Manhattan (L1) distance |
<#> | Negative inner product | Negative inner product distance |
Vector similarity search
Use vector distance operators withORDER BY
and LIMIT
to perform similarity search:
Vector indexes on function expressions
You can create vector indexes on function expressions instead of raw columns. This allows you to avoid storing a separate vector column, saving storage and reducing maintenance costs.- Create the table to include the input column
embedding
column is used to store the embedding generated from the description
column. If you create the vector index directly from description
column with function expression, you don’t have to store raw embedding
in the table.
- Define the user-defined function (UDF)
- Create the vector index on the function expression
get_embedding(description)
is used as the index expression.
This approach avoids materializing a separate vector column in the table, which reduces storage costs and keeps the table schema simpler.
Examples
Basic vector similarity search
Using cosine distance type
The SQL query depends on the type of vector index you created:- If the vector index is built on a raw embedding column, use the raw column in your
ORDER BY
clause.
- If the vector index is built using a function expression, use the same function expression in your
ORDER BY
clause.