Set up a RisingWave cluster in Kubernetes
This article will help you use the Kubernetes Operator for RisingWave (hereinafter ‘the Operator’) to deploy a RisingWave cluster in Kubernetes.
The Operator is a deployment and management system for RisingWave. It runs on top of Kubernetes and provides functionalities like provisioning, upgrading, scaling, and destroying the RisingWave instances inside the cluster.
Prerequisites
-
Ensure that the Kubernetes command-line tool
kubectl
is installed in your environment. -
Ensure that the PostgreSQL interactive terminal
psql
is installed in your environment. -
Ensure that Docker is installed in your environment and running.
Create a Kubernetes cluster
The steps in this section are intended for creating a Kubernetes cluster in your local environment.
If you are using a managed Kubernetes service such as AKS, GKE, and EKS, refer to the corresponding documentation for instructions.
Steps:
-
kind
is a tool for running local Kubernetes clusters using Docker containers as cluster nodes. You can see the available tags ofkind
on Docker Hub. -
Create a cluster.
kind create cluster
-
Optional: Check if the cluster is created properly.
kubectl cluster-info
Deploy the Operator
Before the deployment, ensure that the following requirements are satisfied.
- Docker version ≥ 18.09
kubectl
version ≥ 1.18- For Linux, set the value of the
sysctl
parameternet.ipv4.ip_forward
to 1.
Steps:
-
Install
cert-manager
and wait a minute to allow for initialization. -
Install the latest version of the Operator.
kubectl apply --server-side -f https://github.com/risingwavelabs/risingwave-operator/releases/latest/download/risingwave-operator.yaml
If you'd like to install a certain version of the Operator
Run the following command to install a specific version instead of the latest version.
# Replace ${VERSION} with the version you want to install, e.g., v0.4.0
kubectl apply --server-side -f https://github.com/risingwavelabs/risingwave-operator/releases/download/${VERSION}/risingwave-operator.yamlCompatibility table
Operator RisingWave Kubernetes v0.4.0 v0.18.0+ v1.21+ v0.3.6 v0.18.0+ v1.21+ You can find the release notes of each version here.
noteThe following errors might occur if
cert-manager
is not fully initialized. Simply wait for another minute and rerun the command above.Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": dial tcp 10.105.102.32:443: connect: connection refused
-
Optional: Check if the Pods are running.
kubectl -n cert-manager get pods
kubectl -n risingwave-operator-system get pods
Deploy a RisingWave instance
When deploying a RisingWave instance, you can choose from multiple object storage options to persist your data. Depending on the option you choose, the deployment instructions are different.
Notes about disks for etcd
RisingWave uses etcd for persisting data for meta nodes. It's important to note that etcd is highly sensitive to disk write latency. Slow disk performance can lead to increased etcd request latency and potentially impact the stability of the cluster.
To ensure optimal performance and cluster stability, please consider the following recommendations:
- For optimum disk performance, we recommend using a typical local SSD or a high-performance virtualized block device. If you choose to deploy etcd on Amazon EBS, we recommend gp3 or faster SSD volumes.
- If you have a single meta node, please increase the value of
meta_leader_lease_secs
to optimize performance. - If MinIO is being used, avoid deploying etcd and MinIO on the same disks to prevent any potential conflicts or performance degradation.
For detailed disk performance requirements and recommendations, see the Disks section in the etcd documentation.
Optional: Customize the state store directory
You can customize the directory for storing state data via the spec: stateStore: dataDirectory
parameter in the risingwave.yaml
file that you want to use to deploy a RisingWave instance. If you have multiple RisingWave instances, ensure the value of dataDirectory
for the new instance is unique (the default value is hummock
). Otherwise, the new RisingWave instance may crash. Save the changes to the risingwave.yaml
file before running the kubectl apply -f <...risingwave.yaml>
command. The directory path cannot be an absolute address, such as /a/b
, and must be no longer than 180 characters.
- etcd+S3
- etcd+MinIO
- etcd+HDFS
RisingWave supports using Amazon S3 as object storage for persistent data.
Steps:
-
Create a Secret with the name
s3-credentials
.kubectl create secret generic s3-credentials --from-literal AccessKeyID=${ACCESS_KEY} --from-literal SecretAccessKey=${SECRET_ACCESS_KEY}
-
On the S3 console, create a bucket with the name
risingwave
in the US East (N. Virginia) (us-east-1
) region. -
Deploy a RisingWave instance with S3 as the object storage.
kubectl apply -f https://raw.githubusercontent.com/risingwavelabs/risingwave-operator/main/docs/manifests/stable/persistent/s3/risingwave.yaml
Click here if you wish to customize the name and region of the S3 bucket
Before executing the above command, customize the S3 bucket according to your specific requirements by following these steps.
-
Download the manifest file from the link above.
-
Open the downloaded file and modify the necessary fields, such as the bucket name and region according to your preferences.
-
Save the modified file to your local file system.
-
Replace the URL in the command with the local file path of the modified manifest file and then run the command. For example:
kubectl apply -f a.yaml # relative path
kubectl apply -f /tmp/a.yaml # absolute path
-
RisingWave supports using MinIO as object storage for persistent data.
The performance of MinIO is closely tied to the disk performance of the node where it is hosted. We have observed that AWS EBS does not perform well in our tests. For optimal performance, we recommend using S3 or a compatible cloud service.
Run the following command to deploy a RisingWave instance with MinIO as the object storage.
kubectl apply -f https://raw.githubusercontent.com/risingwavelabs/risingwave-operator/main/docs/manifests/stable/persistent/minio/risingwave.yaml
RisingWave supports using HDFS as object storage for persistent data.
Deploy a RisingWave instance with HDFS as the object storage.
kubectl apply -f https://raw.githubusercontent.com/risingwavelabs/risingwave-operator/main/docs/manifests/risingwave/risingwave-etcd-hdfs.yaml
You can check the status of the RisingWave instance by running the following command.
kubectl get risingwave
If the instance is running properly, the output should look like this:
- etcd+S3
- etcd+MinIO
- etcd+HDFS
NAME RUNNING STORAGE(META) STORAGE(OBJECT) AGE
risingwave True etcd S3 30s
NAME RUNNING STORAGE(META) STORAGE(OBJECT) AGE
risingwave True etcd MinIO 30s
NAME RUNNING STORAGE(META) STORAGE(OBJECT) AGE
risingwave-etcd-hdfs True Etcd HDFS 30s
Connect to RisingWave
- ClusterIP
- NodePort
- LoadBalancer
By default, the Operator creates a service for the frontend component, through which you can interact with RisingWave, with the type of ClusterIP
. But it is not accessible outside Kubernetes. Therefore, you need to create a standalone Pod for PostgreSQL inside Kubernetes.
Steps:
-
Create a Pod.
kubectl apply -f https://raw.githubusercontent.com/risingwavelabs/risingwave-operator/main/docs/manifests/psql/psql-console.yaml
-
Attach to the Pod so that you can execute commands inside the container.
kubectl exec -it psql-console -- bash
-
Connect to RisingWave via
psql
.- etcd+MinIO
- etcd+S3
- etcd+HDFS
psql -h risingwave-frontend -p 4567 -d dev -U root
psql -h risingwave-frontend -p 4567 -d dev -U root
psql -h risingwave-etcd-hdfs-frontend -p 4567 -d dev -U root
You can connect to RisingWave from Nodes such as EC2 in Kubernetes
Steps:
-
In the
risingwave.yaml
file that you use to deploy the RisingWave instance, add afrontendServiceType
parameter to the configuration of the RisingWave service, and set its value toNodePort
.# ...
kind: RisingWave
...
spec:
frontendServiceType: NodePort
# ... -
Connect to RisingWave by running the following commands on the Node.
- etcd+MinIO
- etcd+S3
- etcd+HDFS
export RISINGWAVE_NAME=risingwave
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get node -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].nodePort}'`
psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U rootexport RISINGWAVE_NAME=risingwave
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get node -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].nodePort}'`
psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U rootexport RISINGWAVE_NAME=risingwave-etcd-hdfs
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get node -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].nodePort}'`
psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U root
If you are using EKS, GCP, or other managed Kubernetes services provided by cloud vendors, you can expose the Service to the public network with a load balancer in the cloud.
Steps:
-
Set the Service type to
LoadBalancer
.# ...
spec:
global:
serviceType: LoadBalancer
# ... -
Connect to RisingWave with the following commands.
- etcd+MinIO
- etcd+S3
- etcd+HDFS
export RISINGWAVE_NAME=risingwave
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].port}'`
psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U rootexport RISINGWAVE_NAME=risingwave
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].port}'`
psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U rootexport RISINGWAVE_NAME=risingwave-etcd-hdfs
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].port}'`
psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U root
Now you can ingest and transform streaming data. See Quick start for details.