Skip to main content

Back up and restore meta service

This guide introduces how to back up meta service data and restore from a backup.

A meta snapshot is a backup of meta service's data at a specific point in time. Meta snapshots are persisted in S3-compatible storage.

Set backup parameters

Before you can create a meta snapshot, you need to set the backup_storage_url and backup_storage_directory system parameters prior to the first backup attempt.

caution

Be careful not to set the backup_storage_url and backup_storage_directory when there are snapshots. However, it's not strictly forbidden. If you insist on doing so, please note the snapshots taken before the setting will all be invalidated and cannot be used in restoration anymore.

To learn about how to configure system parameters, see How to configure system parameters.

Create a meta snapshot

Meta snapshot is created by meta service whenever requested by users. There is no automatic process in RisingWave kernel that creates meta snapshot regularly.

Here's an example of how to create a new meta snapshot with risectl:

risectl meta backup-meta

risectl is included in the pre-built RisingWave binary. For details, see Quick start.

View existing meta snapshots

The following SQL command lists existing meta snapshots:

SELECT meta_snapshot_id FROM rw_catalog.rw_meta_snapshot;

Example output:

 meta_snapshot_id
------------------
3
4

Delete a meta snapshot

Here's an example of how to delete a meta snapshot with risectl:

risectl meta delete-meta-snapshots [snapshot_ids]

Restore from a meta snapshot

Below are two separate methods to restore from a meta snapshot using SQL database and etcd as the meta store backend.

SQL database as meta store backend

If the cluster has been using a SQL database as meta store backend, follow these steps to restore from a meta snapshot.

  1. Shut down the meta service.

    note

    This step is especially important because the meta backup and recovery process does not replicate SST files. It is not permitted for multiple clusters to run with the same SSTs set at any time, as this can corrupt the SST files.

  2. Create a new meta store, i.e. a new SQL database instance.

    Note that this new SQL database instance must have the exact same tables defined as the original, but all tables should remain empty. To achieve this, you can optionally use the schema migration tool to create tables, then truncate those non-empty tables populated by the tool.

  3. Restore the meta snapshot to the new meta store.

    risectl \
    meta \
    restore-meta \
    --meta-store-type sql \
    --meta-snapshot-id [snapshot_id] \
    --sql-endpoint [sql_endpoint] \
    --backup-storage-url [backup_storage_url, e.g. s3://bucket_read_from] \
    --backup-storage-directory [backup_storage_directory, e.g. dir_read_from] \
    --hummock-storage-url [hummock_storage_url, e.g. s3://bucket_write_to] \
    --hummock-storage-directory [hummock_storage_directory, e.g. dir_write_to]

    restore-meta reads snapshot data from backup storage and writes them to meta store and hummock storage.

    For example, given the cluster settings below:

    psql=> show parameters;
    Name | Value | Mutable
    ---------------------------------------+--------------------------------------+---------
    state_store | hummock+s3://state_bucket | f
    data_directory | state_data | f
    backup_storage_url | s3://backup_bucket | t
    backup_storage_directory | backup_data | t

    Parameters to risectl meta restore-meta should be:

    • --backup-storage-url s3://backup_bucket.
    • --backup-storage-directory backup_data.
    • --hummock-storage-url s3://state_bucket. Note that the hummock+ prefix is stripped.
    • --hummock-storage-directory state_data.
  4. Configure meta service to use the new meta store.

etcd as meta store backend

If the cluster has been using etcd as meta store backend, follow these steps to restore from a meta snapshot.

  1. Shut down the meta service.

    note

    This step is especially important because the meta backup and recovery process does not replicate SST files. It is not permitted for multiple clusters to run with the same SSTs set at any time, as this can corrupt the SST files.

  2. Create a new meta store, i.e. a new and empty etcd instance.

  3. Restore the meta snapshot to the new meta store.

    risectl \
    meta \
    restore-meta \
    --meta-store-type etcd \
    --meta-snapshot-id [snapshot_id] \
    --etcd-endpoints [etcd_endpoints, e.g. 127.0.0.1:2388] \
    --backup-storage-url [backup_storage_url, e.g. s3://bucket_read_from] \
    --backup-storage-directory [backup_storage_directory, e.g. dir_read_from] \
    --hummock-storage-url [hummock_storage_url, e.g. s3://bucket_write_to] \
    --hummock-storage-directory [hummock_storage_directory, e.g. dir_write_to]

    If etcd enables authentication, also specify the following:

    --etcd-auth \
    --etcd-username [etcd_username] \
    --etcd-password [etcd_password] \

    restore-meta reads snapshot data from backup storage and writes them to meta store and hummock storage.

    For example, given the cluster settings below:

    psql=> show parameters;
    Name | Value | Mutable
    ---------------------------------------+--------------------------------------+---------
    state_store | hummock+s3://state_bucket | f
    data_directory | state_data | f
    backup_storage_url | s3://backup_bucket | t
    backup_storage_directory | backup_data | t

    Parameters to risectl meta restore-meta should be:

    • --backup-storage-url s3://backup_bucket.
    • --backup-storage-directory backup_data.
    • --hummock-storage-url s3://state_bucket. Note that the hummock+ prefix is stripped.
    • --hummock-storage-directory state_data.
  4. Configure meta service to use the new meta store.

Access historical data backed up by meta snapshot

Meta snapshot is used to support historical data access, also known as time travel query.

Use the following steps to perform a time travel query.

  1. List all valid historical point-in-time (i.e., epoch) for a table. For example to query the table of id 6:

    SELECT state_table_info->'6'->>'safeEpoch' as safe_epoch,state_table_info->'6'->>'committedEpoch' committed_epoch from rw_meta_snapshot;

    Example output:

        safe_epoch   | committed_epoch  
    -----------------+------------------
    7039353459507200 | 7039354678542336
    7039354678542346 | 7039622397886464

    Choose an epoch to query. Valid epochs are within range [safe_epoch,committed_epoch], e.g. [7039353459507200, 7039354678542336] or [7039354678542346, 7039622397886464].

  2. Set session config QUERY_EPOCH. By default, it's 0, which means disabling historical query.

    SET QUERY_EPOCH=[chosen epoch];

    Then, batch queries in this session return data as of this epoch instead of the latest one.

  3. Disable historical query.

    SET QUERY_EPOCH=0;
Limitation

RisingWave only supports historical data access at a specific point in time backed up by at least one meta snapshot.

Help us make this doc better!