Skip to main content

What is stream processing?

Stream processing is the technique of continuously computing on data as it is produced or received, rather than storing it first and processing it later. In stream processing, data is treated as an unbounded, continuous flow of events — each event is processed as soon as it arrives, producing results in real time. Examples of streaming data include sensor readings from IoT devices, user clickstreams on a website, financial market trades, application logs, and database change events (CDC). All of these are naturally produced as continuous streams of events over time.

Stream processing vs. batch processing

The fundamental difference is when computation happens:
Batch processingStream processing
When data is processedAfter collection, on a scheduleContinuously, as data arrives
LatencyMinutes to hoursMilliseconds to seconds
Data modelBounded datasets (files, tables)Unbounded event streams
Typical toolsSpark, Hive, traditional SQLRisingWave, Flink, Kafka Streams
Best forHistorical analytics, reportingReal-time monitoring, alerting, serving
Batch processing is ideal when you need to analyze large historical datasets and can tolerate delay. Stream processing is essential when you need results immediately — for monitoring, fraud detection, real-time personalization, or operational dashboards. Many modern architectures use both: stream processing for real-time results and batch processing for historical analysis and backfilling.

Core concepts in stream processing

Events

An event is a single record in a data stream — for example, a user click, a sensor reading, or a database row change. Events typically have a timestamp, a key, and a payload.

Windows

Windows group events by time for aggregation. Common window types include:
  • Tumble windows: Fixed-size, non-overlapping time intervals (e.g., every 1 minute).
  • Hop windows: Fixed-size, overlapping intervals (e.g., 5-minute windows sliding every 1 minute).
  • Session windows: Dynamic intervals based on event activity gaps.

State

Stream processing often requires maintaining state across events — for example, running counts, sums, or the latest value per key. State management is one of the most challenging aspects of stream processing, as it must be durable, consistent, and recoverable from failures.

Watermarks and late events

In distributed systems, events may arrive out of order or late. Watermarks are a mechanism to track event-time progress and determine when a window can be closed. Stream processing systems must handle late events gracefully — either by updating existing results or discarding them.

How RisingWave simplifies stream processing

Traditional stream processing frameworks like Apache Flink and Kafka Streams require writing Java/Scala code, managing state backends, deploying application JARs, and building separate serving layers. RisingWave simplifies this by providing stream processing as a database:
  • SQL-native: Define streaming pipelines with CREATE MATERIALIZED VIEW — no application code needed.
  • Built-in state management: State is stored in object storage (S3) via the Hummock engine — no external state backends.
  • Built-in query serving: Query materialized view results directly with SQL — no separate serving database needed.
  • PostgreSQL compatibility: Use psql, JDBC, or any PostgreSQL-compatible tool.
-- Stream processing in RisingWave: detect high-value orders in real time
CREATE MATERIALIZED VIEW high_value_orders AS
SELECT
  customer_id,
  window_start,
  SUM(amount) AS total_amount,
  COUNT(*) AS order_count
FROM TUMBLE(orders, order_time, INTERVAL '5 minutes')
GROUP BY customer_id, window_start
HAVING SUM(amount) > 10000;

Common stream processing use cases

  • Real-time monitoring and alerting: Detect anomalies, threshold violations, or SLA breaches as they happen.
  • Fraud detection: Identify suspicious patterns across financial transactions in real time.
  • Real-time dashboards: Power live dashboards with continuously updated metrics.
  • Event-driven microservices: React to business events (orders, payments, sign-ups) immediately.
  • IoT and telemetry: Aggregate sensor data, compute rolling averages, and detect device failures.
  • Streaming ETL: Transform and enrich data in flight before delivering to data warehouses or lakes.