Hosted Iceberg Catalog
Learn how to use the Hosted Iceberg Catalog in RisingWave. This guide shows you how to create and manage Iceberg tables without the need to configure or maintain an external catalog service like AWS Glue or a JDBC database.
The Hosted Iceberg Catalog is a built-in catalog service for the Iceberg table engine. It simplifies the process of creating Iceberg tables by eliminating the need to set up, configure, and manage an external catalog service like AWS Glue, a separate JDBC database, or a REST service. For users new to Iceberg, or for those who want to reduce operational overhead, the hosted catalog provides the quickest way to get started with the Iceberg table engine in RisingWave.
How it works
When you enable the hosted catalog in an Iceberg connection, RisingWave utilizes its internal metastore (which is typically a PostgreSQL instance) to function as a standard Iceberg JDBC Catalog.
- Table metadata for tables created with the
ENGINE = iceberg
clause is stored within two system views in RisingWave:iceberg_tables
andiceberg_namespace_properties
. - This implementation is not a proprietary format; it adheres to the standard Iceberg JDBC Catalog protocol.
- This ensures that the tables remain open and accessible to external tools like Spark, Trino, and Flink that can connect to a JDBC catalog.
Create a connection with the hosted catalog
To use the hosted catalog, you create an Iceberg connection and set the hosted_catalog
parameter to true
.
Syntax
Where <storage_path>
and <object_storage_parameters>
depend on your chosen storage backend (S3, GCS, or Azure Blob). See the object storage configuration for specific parameter details.
Parameters
For object storage configuration parameters, see:
Hosted catalog-specific parameters
Field | Description |
---|---|
hosted_catalog | Required. Set to true to enable the Hosted Iceberg Catalog for this connection. This instructs RisingWave to manage the catalog metadata internally. |
Example: Creating and using a table
Here is a complete walkthrough of creating a connection, setting it as the default for the Iceberg engine, and creating a table.
Benefits of the hosted catalog
- Zero external dependencies: No need to set up AWS Glue, PostgreSQL, or other catalog services
- Rapid prototyping: Get started with Iceberg immediately without infrastructure setup
- Standard compliance: Uses the standard Iceberg JDBC catalog protocol for compatibility
- External accessibility: Tables can be accessed by external Iceberg-compatible tools
- Reduced complexity: Fewer moving parts in your data architecture
External access to hosted catalog tables
Since the hosted catalog implements the standard Iceberg JDBC catalog protocol, external tools can access your tables by connecting to RisingWave’s metastore.
Spark example
Trino example
Add this to your Trino catalog configuration:
Then query from Trino:
When to use external catalogs instead
While the hosted catalog is great for getting started, you might want to use external catalogs when:
- Multi-system environments: Multiple systems need to share the same catalog metadata
- Enterprise requirements: You need integration with existing catalog infrastructure (AWS Glue, etc.)
- Governance: You have strict data governance requirements that mandate specific catalog systems
- Scale: You’re managing hundreds or thousands of Iceberg tables across multiple systems
For external catalog configurations, see Catalog configuration.
System tables
When using the hosted catalog, you can inspect the catalog metadata through RisingWave’s system tables:
Best practices
- Development and testing: Use the hosted catalog for development, prototyping, and testing scenarios
- Simple production workloads: Good for production workloads where RisingWave is the primary system managing Iceberg tables
- Backup your metadata: Since metadata is stored in RisingWave’s metastore, ensure you have proper backup procedures
- Monitor storage growth: Keep an eye on metastore storage as you create more tables
- Plan for scale: Consider external catalogs if you anticipate managing many tables or integrating with multiple systems
Next steps
- Create your first table: Follow the Iceberg Table Engine guide for detailed table creation and usage.
- Configure object storage: Review Object storage configuration for your storage backend.
- Explore external access: Test connecting external tools like Spark or Trino to your hosted catalog tables.