Bring Your Own Iceberg (BYOI) refers to scenarios where external systems manage the Iceberg tables and RisingWave connects to them as a client. In this approach, you have existing Iceberg tables (typically created by Spark, Flink, batch ETL jobs, or other systems) and you want RisingWave to read from or write to these tables.

When to use Bring Your Own Iceberg

Choose this approach when:

  • Existing data lake: You already have Iceberg tables managed by other systems (Spark, Flink, dbt, etc.).
  • Multi-system architecture: Multiple applications/engines need to read/write the same Iceberg tables.
  • Integration requirements: You need to integrate RisingWave into existing data workflows and pipelines.
  • External catalog infrastructure: You have existing catalog services (AWS Glue, JDBC databases, REST catalogs) managing your Iceberg metadata.
  • Data lake ingestion: You want to stream processed data from RisingWave into your existing data lake.
  • Batch+Stream hybrid: Combining batch processing (other systems) with stream processing (RisingWave) on the same tables.

Key capabilities

Read from Iceberg tables (Iceberg Source)

Ingest data from existing Iceberg tables into RisingWave for stream processing:

CREATE SOURCE my_iceberg_source
WITH (
    connector = 'iceberg',
    warehouse.path = 's3://my-data-lake/warehouse',
    database.name = 'my_db',
    table.name = 'user_events',
    catalog.type = 'glue',  -- Use AWS Glue as catalog
    s3.access.key = 'your-access-key',
    s3.secret.key = 'your-secret-key',
    s3.region = 'us-west-2'
);
  • Streaming ingestion: Continuously read new data as it’s added to the Iceberg table
  • Time travel: Read historical snapshots of the data
  • Schema evolution: Automatically handle schema changes in the source table

Write to Iceberg tables (Iceberg Sink)

Stream processed results from RisingWave into existing Iceberg tables:

CREATE SINK my_iceberg_sink FROM processed_data
WITH (
    connector = 'iceberg',
    warehouse.path = 's3://my-data-lake/warehouse',
    database.name = 'analytics',
    table.name = 'aggregated_metrics',
    catalog.type = 'rest',
    catalog.uri = 'http://my-catalog:8181',
    type = 'upsert',  -- Support both append-only and upsert modes
    s3.access.key = 'your-access-key',
    s3.secret.key = 'your-secret-key'
);
  • Multiple sink modes: Append-only or upsert depending on your use case
  • Exactly-once delivery: Ensure data consistency with configurable delivery semantics
  • External accessibility: Data written by RisingWave is immediately available to other systems

Supported catalog types

BYOI works with various external catalog systems:

  • AWS Glue: Managed metadata service on AWS
  • JDBC catalogs: PostgreSQL, MySQL, or other JDBC-compatible databases storing metadata
  • REST catalogs: RESTful catalog services including AWS S3 Tables
  • Storage catalogs: Direct filesystem-based metadata (S3/HDFS)
  • Hive Metastore: Traditional Hadoop ecosystem catalog
  • Snowflake: Snowflake-managed Iceberg catalogs

Architecture patterns

Lambda/Kappa architecture

  • Batch layer: Spark/Flink writes to Iceberg tables.
  • Speed layer: RisingWave reads from Iceberg (batch results) and streams real-time updates.
  • Serving layer: Analytics tools query the combined results.

Multi-engine data lake

  • Multiple writers: Different systems write to the same Iceberg tables.
  • Multiple readers: Various engines read from shared tables.
  • RisingWave role: Provides real-time streaming capabilities to the data lake.

ETL/ELT pipelines

  • Extract: RisingWave reads from various sources.
  • Transform: Stream processing in RisingWave.
  • Load: Write results to existing data lake via Iceberg sink.

What’s included in this section

Next steps

  1. Identify your catalog: Determine what catalog system manages your existing Iceberg tables.
  2. Start with reading: Create an Iceberg source to ingest existing data.
  3. Add streaming outputs: Set up Iceberg sinks to export processed results.
  4. Configure catalogs: Review catalog configuration for your specific setup.

Comparing approaches: If you want RisingWave to create and manage Iceberg tables directly, see RisingWave Managed Iceberg instead.