Top-N by group
Top-N queries return only the N top-most or the N bottom-most records from a table or view based on a condition.
In RisingWave, a Top-N query includes a ranking function clause and a rank filtering condition. In the ranking function clause, you can include a PARTITION BY
clause to fetch top N rows per group.
Syntax
The syntax of the ranking_function_clause
is:
rank
cannot be included in column_list
.
You must follow the pattern exactly to construct a valid Top-N query.
Parameter | Description |
---|---|
function_name | RisingWave supports two window functions in top-N queries: row_number(): Returns the sequential row ordinal (1-based) of each row for each ordered partition.rank(): Returns the ordinal (1-based) rank of each row within the ordered partition. All peer rows receive the same rank value. The next row or set of peer rows receives a rank value which increments by the number of peers with the previous rank value. |
PARTITION BY clause | Specifies the partition columns. Each partition will have a Top-N result. |
ORDER BY clause | Specifies how the rows are ordered. |
rank_range | Specifies the range of the rank number. The rank range is required for the query to be recognized as a top-N query. The range can be specified in these forms. Examples: WHERE M < rank AND rank < N or WHERE rank between M and N. Optionally, you can specify any additional conditions to further filter the results. |
Example
Converting StreamOverWindow to StreamGroupTopN
Given the following table:
You may create the following ranked materialized view to sort the data over the partition v1
and order by v2
:
The operator responsible for sorting the data is StreamOverWindow
. This is a streaming operator that maintains the state of the window and computes the row number for each row in the partition.
This can be quite expensive, especially if the partition is large.
If you only need the top N rows from each partition, you can use the StreamGroupTopN
operator instead.
This can be done by simply adding a WHERE
clause to the query, and applying it to the row column (r
in the above example).
You can view the following query as an example, which maintains top 10 rows for each partition:
Given that the state size to maintain is a lot smaller, this means both reading and writing the state has less overhead.
As a result, the StreamGroupTopN
operator is much more efficient than the StreamOverWindow
operator.