The Hosted Iceberg Catalog is a built-in catalog service for the Iceberg table engine. It simplifies the process of creating Iceberg tables by eliminating the need to set up, configure, and manage an external catalog service like AWS Glue, a separate JDBC database, or a REST service. For users new to Iceberg, or for those who want to reduce operational overhead, the hosted catalog provides the quickest way to get started with the Iceberg table engine in RisingWave.

How it works

When you enable the hosted catalog in an Iceberg connection, RisingWave utilizes its internal metastore (which is typically a PostgreSQL instance) to function as a standard Iceberg JDBC Catalog.

  • Table metadata for tables created with the ENGINE = iceberg clause is stored within two system views in RisingWave: iceberg_tables and iceberg_namespace_properties.
  • This implementation is not a proprietary format; it adheres to the standard Iceberg JDBC Catalog protocol.
  • This ensures that the tables remain open and accessible to external tools like Spark, Trino, and Flink that can connect to a JDBC catalog.

Create a connection with the hosted catalog

To use the hosted catalog, you create an Iceberg connection and set the hosted_catalog parameter to true.

Syntax

CREATE CONNECTION connection_name
WITH (
    type = 'iceberg',
    warehouse.path = 's3://<your_bucket_name>/<path_to_warehouse>',
    s3.region = '<your_s3_region>',
    s3.access.key = '<your_iam_access_key>',
    s3.secret.key = '<your_iam_secret_key>',
    s3.endpoint = '<your_s3_endpoint>', -- Optional
    hosted_catalog = true
);

Parameters

FieldDescription
hosted_catalogRequired. Set to true to enable the Hosted Iceberg Catalog for this connection. This instructs RisingWave to manage the catalog metadata internally.
warehouse.pathRequired. The S3 path to your Iceberg warehouse, where table data and metadata will be stored.
s3.regionRequired. The AWS region of your S3 bucket.
s3.access.keyRequired. The Access Key ID for your IAM user. For enhanced security, you can use secrets.
s3.secret.keyRequired. The Secret Access Key for your IAM user. For enhanced security, you can use secrets.
s3.endpointOptional. The S3-compatible object storage endpoint.

Example: Creating and using a table

Here is a complete walkthrough of creating a connection, setting it as the default for the Iceberg engine, and creating a table.

Step 1: Create the connection

Create an Iceberg connection with the hosted catalog enabled.

CREATE CONNECTION my_hosted_catalog_conn
WITH (
    type = 'iceberg',
    warehouse.path = 's3://my-iceberg-bucket/hosted-warehouse',
    s3.region = 'us-east-1',
    s3.access.key = 'xxxxx',
    s3.secret.key = 'yyyyy',
    hosted_catalog = true
);

Step 2: Set the active connection

Configure the Iceberg table engine to use the connection you just created.

SET iceberg_engine_connection = 'public.my_hosted_catalog_conn';

Step 3: Create and populate the table

Now you can create a table using ENGINE = iceberg. RisingWave will use the hosted catalog to manage its metadata.

CREATE TABLE t_hosted_catalog (
    id INT PRIMARY KEY, 
    name VARCHAR
) ENGINE = iceberg;

INSERT INTO t_hosted_catalog VALUES (1, 'RisingWave');

Accessing the hosted catalog from external tools

Since the hosted catalog is a standard JDBC catalog, you can connect external query engines to read the tables managed by RisingWave.

Authentication and authorization

External tools can connect to the hosted catalog using a RisingWave database user account and password. The SELECT privileges of the user account determine which Iceberg tables can be accessed.

Schema mapping

When accessing the tables from an external tool, use the following mapping between Iceberg and RisingWave naming conventions:

IcebergRisingWave
catalog namedatabase name
namespaceschema
tabletable

Example: Connecting with Spark

You can configure a Spark session to read the t_hosted_catalog table created in the example above. In this Spark SQL command, assume the table was created in the RisingWave database named dev and schema public.

# Launch Spark SQL with necessary packages and configurations
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2,org.postgresql:postgresql:42.7.4 \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.dev=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.dev.catalog-impl=org.apache.iceberg.jdbc.JdbcCatalog \
    --conf spark.sql.catalog.dev.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.sql.catalog.dev.warehouse='s3://my-iceberg-bucket/hosted-warehouse' \
    --conf spark.sql.catalog.dev.uri="jdbc:postgresql://<risingwave-hostname>:<port>/dev" \
    --conf spark.sql.catalog.dev.jdbc.user="<your_rw_user>" \
    --conf spark.sql.catalog.dev.jdbc.password="<your_rw_password>"

Once connected, you can query the table using its three-part name (catalog.namespace.table):

spark-sql> SELECT * FROM dev.public.t_hosted_catalog;
+---+------------+
| id|        name|
+---+------------+
|  1|  RisingWave|
+---+------------+

Inspecting the catalog within RisingWave

You can query the system views iceberg_tables and iceberg_namespace_properties directly in RisingWave to see the catalog’s metadata. The catalog exposed by RisingWave is read-only.

select * from iceberg_tables;
-----------
 catalog_name | table_namespace |    table_name    |                                                      metadata_location                                                       | previous_metadata_location | iceberg_type
--------------+-----------------+------------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------
 dev          | public          | t_hosted_catalog | s3://hummock001/iceberg_connection/public/t_hosted_catalog/metadata/00000-e267a2ad-daf1-4d05-b7bd-75ff30e17629.metadata.json |                            |
(1 row)

The Hosted Iceberg Catalog is a built-in catalog service for the Iceberg table engine. It simplifies the process of creating Iceberg tables by eliminating the need to set up, configure, and manage an external catalog service like AWS Glue, a separate JDBC database, or a REST service. For users new to Iceberg, or for those who want to reduce operational overhead, the hosted catalog provides the quickest way to get started with the Iceberg table engine in RisingWave.

How it works

When you enable the hosted catalog in an Iceberg connection, RisingWave utilizes its internal metastore (which is typically a PostgreSQL instance) to function as a standard Iceberg JDBC Catalog.

  • Table metadata for tables created with the ENGINE = iceberg clause is stored within two system views in RisingWave: iceberg_tables and iceberg_namespace_properties.
  • This implementation is not a proprietary format; it adheres to the standard Iceberg JDBC Catalog protocol.
  • This ensures that the tables remain open and accessible to external tools like Spark, Trino, and Flink that can connect to a JDBC catalog.

Create a connection with the hosted catalog

To use the hosted catalog, you create an Iceberg connection and set the hosted_catalog parameter to true.

Syntax

CREATE CONNECTION connection_name
WITH (
    type = 'iceberg',
    warehouse.path = 's3://<your_bucket_name>/<path_to_warehouse>',
    s3.region = '<your_s3_region>',
    s3.access.key = '<your_iam_access_key>',
    s3.secret.key = '<your_iam_secret_key>',
    s3.endpoint = '<your_s3_endpoint>', -- Optional
    hosted_catalog = true
);

Parameters

FieldDescription
hosted_catalogRequired. Set to true to enable the Hosted Iceberg Catalog for this connection. This instructs RisingWave to manage the catalog metadata internally.
warehouse.pathRequired. The S3 path to your Iceberg warehouse, where table data and metadata will be stored.
s3.regionRequired. The AWS region of your S3 bucket.
s3.access.keyRequired. The Access Key ID for your IAM user. For enhanced security, you can use secrets.
s3.secret.keyRequired. The Secret Access Key for your IAM user. For enhanced security, you can use secrets.
s3.endpointOptional. The S3-compatible object storage endpoint.

Example: Creating and using a table

Here is a complete walkthrough of creating a connection, setting it as the default for the Iceberg engine, and creating a table.

Step 1: Create the connection

Create an Iceberg connection with the hosted catalog enabled.

CREATE CONNECTION my_hosted_catalog_conn
WITH (
    type = 'iceberg',
    warehouse.path = 's3://my-iceberg-bucket/hosted-warehouse',
    s3.region = 'us-east-1',
    s3.access.key = 'xxxxx',
    s3.secret.key = 'yyyyy',
    hosted_catalog = true
);

Step 2: Set the active connection

Configure the Iceberg table engine to use the connection you just created.

SET iceberg_engine_connection = 'public.my_hosted_catalog_conn';

Step 3: Create and populate the table

Now you can create a table using ENGINE = iceberg. RisingWave will use the hosted catalog to manage its metadata.

CREATE TABLE t_hosted_catalog (
    id INT PRIMARY KEY, 
    name VARCHAR
) ENGINE = iceberg;

INSERT INTO t_hosted_catalog VALUES (1, 'RisingWave');

Accessing the hosted catalog from external tools

Since the hosted catalog is a standard JDBC catalog, you can connect external query engines to read the tables managed by RisingWave.

Authentication and authorization

External tools can connect to the hosted catalog using a RisingWave database user account and password. The SELECT privileges of the user account determine which Iceberg tables can be accessed.

Schema mapping

When accessing the tables from an external tool, use the following mapping between Iceberg and RisingWave naming conventions:

IcebergRisingWave
catalog namedatabase name
namespaceschema
tabletable

Example: Connecting with Spark

You can configure a Spark session to read the t_hosted_catalog table created in the example above. In this Spark SQL command, assume the table was created in the RisingWave database named dev and schema public.

# Launch Spark SQL with necessary packages and configurations
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2,org.postgresql:postgresql:42.7.4 \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.dev=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.dev.catalog-impl=org.apache.iceberg.jdbc.JdbcCatalog \
    --conf spark.sql.catalog.dev.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.sql.catalog.dev.warehouse='s3://my-iceberg-bucket/hosted-warehouse' \
    --conf spark.sql.catalog.dev.uri="jdbc:postgresql://<risingwave-hostname>:<port>/dev" \
    --conf spark.sql.catalog.dev.jdbc.user="<your_rw_user>" \
    --conf spark.sql.catalog.dev.jdbc.password="<your_rw_password>"

Once connected, you can query the table using its three-part name (catalog.namespace.table):

spark-sql> SELECT * FROM dev.public.t_hosted_catalog;
+---+------------+
| id|        name|
+---+------------+
|  1|  RisingWave|
+---+------------+

Inspecting the catalog within RisingWave

You can query the system views iceberg_tables and iceberg_namespace_properties directly in RisingWave to see the catalog’s metadata. The catalog exposed by RisingWave is read-only.

select * from iceberg_tables;
-----------
 catalog_name | table_namespace |    table_name    |                                                      metadata_location                                                       | previous_metadata_location | iceberg_type
--------------+-----------------+------------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------
 dev          | public          | t_hosted_catalog | s3://hummock001/iceberg_connection/public/t_hosted_catalog/metadata/00000-e267a2ad-daf1-4d05-b7bd-75ff30e17629.metadata.json |                            |
(1 row)