Hosted Iceberg Catalog
Learn how to use the Hosted Iceberg Catalog in RisingWave. This guide shows you how to create and manage Iceberg tables without the need to configure or maintain an external catalog service like AWS Glue or a JDBC database.
The Hosted Iceberg Catalog is a built-in catalog service for the Iceberg table engine. It simplifies the process of creating Iceberg tables by eliminating the need to set up, configure, and manage an external catalog service like AWS Glue, a separate JDBC database, or a REST service. For users new to Iceberg, or for those who want to reduce operational overhead, the hosted catalog provides the quickest way to get started with the Iceberg table engine in RisingWave.
How it works
When you enable the hosted catalog in an Iceberg connection, RisingWave utilizes its internal metastore (which is typically a PostgreSQL instance) to function as a standard Iceberg JDBC Catalog.
- Table metadata for tables created with the
ENGINE = iceberg
clause is stored within two system views in RisingWave:iceberg_tables
andiceberg_namespace_properties
. - This implementation is not a proprietary format; it adheres to the standard Iceberg JDBC Catalog protocol.
- This ensures that the tables remain open and accessible to external tools like Spark, Trino, and Flink that can connect to a JDBC catalog.
Create a connection with the hosted catalog
To use the hosted catalog, you create an Iceberg connection and set the hosted_catalog
parameter to true
.
Syntax
Where <storage_path>
and <object_storage_parameters>
depend on your chosen storage backend (S3, GCS, or Azure Blob). See the object storage configuration for specific parameter details.
Parameters
For object storage configuration parameters, see:
Hosted catalog-specific parameters
Field | Description |
---|---|
hosted_catalog | Required. Set to true to enable the Hosted Iceberg Catalog for this connection. This instructs RisingWave to manage the catalog metadata internally. |
Example: Creating and using a table
Here is a complete walkthrough of creating a connection, setting it as the default for the Iceberg engine, and creating a table.
Step 1: Create the connection
Create an Iceberg connection with the hosted catalog enabled.
This example uses S3 for object storage. You can also use Google Cloud Storage (GCS) or Azure Blob Storage by replacing the S3 parameters with the appropriate parameters for your chosen storage backend. See the object storage configuration for details.
Step 2: Set the active connection
Configure the Iceberg table engine to use the connection you just created.
Step 3: Create and populate the table
Now you can create a table using ENGINE = iceberg
. RisingWave will use the hosted catalog to manage its metadata.
Accessing the hosted catalog from external tools
Since the hosted catalog is a standard JDBC catalog, you can connect external query engines to read the tables managed by RisingWave.
Authentication and authorization
External tools can connect to the hosted catalog using a RisingWave database user account and password. The SELECT privileges of the user account determine which Iceberg tables can be accessed.
Schema mapping
When accessing the tables from an external tool, use the following mapping between Iceberg and RisingWave naming conventions:
Iceberg | RisingWave |
---|---|
catalog name | database name |
namespace | schema |
table | table |
Example: Connecting with Spark
You can configure a Spark session to read the t_hosted_catalog
table created in the example above. In this Spark SQL command, assume the table was created in the RisingWave database named dev
and schema public
.
Once connected, you can query the table using its three-part name (catalog.namespace.table
):
Inspecting the catalog within RisingWave
You can query the system views iceberg_tables
and iceberg_namespace_properties
directly in RisingWave to see the catalog’s metadata. The catalog exposed by RisingWave is read-only.