Database isolation and resource groups
Improve stability and fault tolerance in RisingWave using database isolation. Learn about baseline workload resilience and how resource groups assign databases to dedicated compute pools for fine-grained control and performance tuning.
Running multiple databases or workloads within a single RisingWave cluster provides operational efficiency. However, without proper isolation, issues in one workload (like a failing streaming job or a crashing compute node) could potentially disrupt others, affecting overall cluster stability.
Starting from v2.3, RisingWave introduces significant enhancements to improve workload isolation:
- Enhanced baseline database isolation: Core improvements that make databases more resilient to localized issues by default.
- Resource groups (Premium Feature): A powerful mechanism for fine-grained control over fault isolation and resource allocation by assigning databases to specific pools of compute nodes.
This topic explains both levels of isolation available in RisingWave.
Enhanced database isolation (baseline)
Available to all users starting from RisingWave v2.3, these core improvements provide a more resilient environment for multi-database deployments out-of-the-box:
- Independent checkpoints: The checkpointing process, crucial for maintaining consistent state, now operates more independently for each database (assuming no direct cross-database dependencies). A checkpoint problem triggered by a job in
database_A
is less likely to halt or affect the checkpoint process for unrelated jobs indatabase_B
. - Localized job error recovery: If a streaming job within a specific database fails (due to internal errors, UDF issues, etc.), the necessary recovery actions are now primarily contained within that database. Unrelated streaming jobs in other databases can continue operating without disruption.
Benefits:
- Increased stability: Reduces the likelihood of cascading failures across unrelated databases.
- Improved availability: Localized issues have a smaller impact radius, allowing unaffected workloads to continue running.
Example scenario: If the Marketing team (marketing_db
) runs an experimental UDF that fails, the enhanced isolation prevents this failure from stalling checkpoints or disrupting critical jobs running in the Sales team’s database (sales_ops_db
).
Resource groups
Resource groups provide a higher level of isolation and control, particularly useful for critical workloads, multi-tenant environments, or performance optimization.
PREMIUM EDITION FEATURE
Resource groups is a Premium Edition feature. All Premium Edition features are available out of the box without additional cost on RisingWave Cloud. For self-hosted deployments, users need to purchase a license key to access this feature. To purchase a license key, please contact sales team at sales@risingwave-labs.com.
For a full list of Premium Edition features, see RisingWave Premium Edition.
PUBLIC PREVIEW
Resource groups is currently in public preview, meaning it is nearing the final product but may not yet be fully stable. If you encounter any issues or have feedback, please reach out to us via our Slack channel. Your input is valuable in helping us improve this feature. For more details, see our Public Preview Feature List.
What are resource groups?
A resource group is a named pool consisting of specific compute nodes (CNs) within your RisingWave cluster. You define these groups (typically via cluster configuration or administrative commands) and can then assign databases to them.
How do resource groups work?
When you assign a database to a specific resource group during creation, all streaming jobs belonging to that database are constrained to run only on the compute nodes within that designated group.
Primary benefits:
- Granular fault isolation (limiting the “blast radius”):
- By creating resource groups using distinct sets of compute nodes and assigning independent databases to separate groups, you achieve robust infrastructure fault isolation.
- If a compute node within
resource_group_A
crashes, it will only impact the databases assigned toresource_group_A
. Databases assigned toresource_group_B
(running on different CNs) remain unaffected by that specific hardware failure.
- Workload matching & performance optimization:
- Allocate workloads to hardware best suited for their needs.
- Example: Create a
high_cpu_pool
resource group using CNs with powerful CPUs and ahigh_memory_pool
resource group using CNs with large amounts of RAM. Assign CPU-intensive databases (e.g., real-time fraud detection) to the former and memory-intensive databases (e.g., large stateful aggregations) to the latter. This optimizes performance and resource utilization.
- Resource governance: Gain finer control and visibility over how compute resources are consumed by different applications or tenants.
Example scenario: A financial institution runs a critical, CPU-intensive fraud detection system (fraud_db
) and a memory-intensive risk calculation system (risk_db
). By creating rg_fraud
(on high-CPU nodes) and rg_risk
(on high-memory nodes) and assigning the databases respectively, they achieve:
- Hardware isolation: A CN failure in
rg_risk
does not impact fraud detection. - Performance tuning: Each workload runs on optimal hardware.
- Resource predictability: Critical
fraud_db
is shielded from resource contention fromrisk_db
.
How to use resource groups
- Assigning a database to a resource group
Use the WITH RESOURCE_GROUP
clause when creating a database:
database_name
: The name of the database to create.resource_group_name
: The name of an existing resource group to assign this database to. The group must exist before creating the database with this clause.
WITH resource_group
clause is omitted, the database will typically be assigned to a default resource group which might contain all available compute nodes not assigned to other specific groups, thus sharing resources more broadly.- Monitoring resource groups and job assignment
You can use system catalogs to view information about resource groups and job assignments:
rw_resource_groups
: Lists the defined resource groups and summarizes their usage.
rw_streaming_jobs
: Shows details about active streaming jobs, including their assigned resource group.
When to use resource groups
While the baseline isolation benefits all users, consider using the premium resource groups feature specifically for:
- Multi-tenant deployments: Provide stronger isolation guarantees (both logical and potentially physical) between tenants housed in separate databases.
- Critical workloads: Ensure maximum resilience and predictable performance for essential streaming jobs by dedicating hardware pools and limiting the impact of unrelated failures.
- Performance tuning: Optimize resource usage by matching specific workload characteristics (CPU-bound, memory-bound) to specialized hardware pools (if available).
- Resource governance: Implement clearer boundaries and accountability for resource consumption across different teams or applications.
Availability
- Enhanced baseline database isolation: Included by default in RisingWave v2.3 and later for all editions.
- Resource groups: Available as a Premium Edition feature in RisingWave v2.3 and later.
Related topics
Was this page helpful?