In stream processing, watermarks are integral when using event time processing logic with event time based operations. Watermarks are like markers or signals that track the progress of event time, allowing you to process events within their corresponding time windows. A watermark is an estimate of the maximum event time observed so far, or a threshold indicating that events received so far have a timestamp later than or equal to the current watermark. Events that arrive with a timestamp earlier than the current watermark are considered late and are not processed within its time window.
Event | Timestamp |
---|---|
Event F | 11:59:30 AM |
Event G | 12:00:00 PM |
Event H | 12:00:10 PM |
Event I | 11:59:50 PM |
Event | Timestamp | Watermark |
---|---|---|
Event F | 11:59:30 AM | 11:59:20 AM |
Event G | 12:00:00 PM | 11:59:50 AM |
Event H | 12:00:11 PM | 12:00:01 PM |
Event I | 11:59:50 PM | 12:00:01 PM |
WATERMARK
clause in RisingWave is as follows:
column_name
is a column that is created when generating the source, usually the event time column.
expr
specifies the watermark generation strategy. The return type of the watermark must be of type timestamp
. A watermark will be updated if the return value is greater than the current watermark.
For example, the watermark generation strategy can be specified as:
time_unit
values include: second, minute, hour, day, month, and year. For more details, see the interval
data type under Overview of data types.
STRUCT
), or perform expression evaluation on the input rows (e.g., casting data types), please refer to generated columns.order_time
minus 5 seconds.