This section covers best practices for optimizing streaming queries. You can often achieve better performance by rewriting your SQL to follow certain recommended patterns.

SQL optimizations

UNION vs UNION ALL

UNION removes duplicates, while UNION ALL does not. If you know that your data does not contain duplicates, use UNION ALL to avoid the overhead of duplicate removal. Alternatively, if the duplicate set is small, you can also filter out duplicates when querying the result set.

OverWindow vs GroupTopN

OverWindow is a streaming operator that maintains the state of the window and computes the row number for each row in the partition.

Queries look something like:

select *, row_number() over (PARTITION BY v1 order by v2) r from t3

This can be optimized by using the GroupTopN operator instead.

Checkout Converting StreamOverWindow to StreamGroupTopN for more details.

Join optimizations

Checkout Join Optimizations for more details.