Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that simplifies the setup, scaling, and management of Apache Kafka clusters, a popular open-source distributed streaming platform.
Kafka is designed to handle real-time data feeds and follows the publisher-subscriber (pub-sub) model. Kafka’s ability to handle high-volume real-time data makes it crucial for data pipelines, analytics, and event-driven architectures.
To ingest data from Amazon MSK into RisingWave, you need an operational Amazon MSK cluster and a Kafka topic established. Once set, you’ll leverage the Kafka connector in RisingWave to consume data from your MSK topic.
This guide will detail the ingesting streaming data from Amazon MSK into RisingWave.
To learn about how to set up an Amazon MSK account and create a cluster, see Getting started using Amazon MSK. For this demo, we will assume the selection of Quick create for the Cluster creation method and Provisioned for the Cluster type. The cluster creation can take about 15 minutes.
While creating your cluster, note down the following information regarding the cluster you want to connect to.
To customize the IAM policy, see IAM access control.
To learn how to create an EC2 client machine and add the security group of the client to the inbound rules of the cluster’s security group from the VPC console, see Create a client machine.
For more information regarding SASL settings, see Sign-in credentials authentication with AWS Secrets Manager.
For more information, see Creating symmetric encryption KMS keys.
<your-username>
and <your-password>
with the username and password you want to set for the cluster.AmazonMSK_
.For more information, see Sign-in credentials authentication with AWS Secrets Manager.
To find your specific command values:
users_jaas.conf
with the following contents in /home/ubuntu
.cacerts
folder into the kafka.client.truststore.jks
copy.client_sasl.properties
at /home/ubuntu
with the following contents.<broker-url>
from now on.Optional: The following command will list the topics.
Once you run the kafka-console-producer
command, you will be prompted to enter messages into the console. Each message should be entered on a new line; you can enter as many messages as you like.
After entering messages, you can close the console window or press Ctrl + C to exit the producer.
See Quick start for options on how you can run RisingWave.
To learn about the specific syntax used to consume data from a Kafka topic, see Ingest data from Kafka.
For example, the following query creates a table that consumes data from an MSK topic connected to Kafka.
Then, you can count the records for accuracy.
To create a cluster and set up an IAM role for the cluster, see Getting started using Amazon MSK.
RisingWave requires the following permissions to access MSK:
kafka-cluster:Connect
kafka-cluster:DescribeTopic
kafka-cluster:DescribeGroup
kafka-cluster:AlterGroup
kafka-cluster:ReadData
kafka-cluster:WriteData
To access MSK using IAM, you need to use the AWS_MSK_IAM
SASL mechanism. You also need to specify the following parameters.
Parameter | Notes |
---|---|
aws.region | Required. AWS service region. For example, US East (N. Virginia). |
aws.credentials.access_key_id | Required. This field indicates the access key ID of AWS. |
aws.credentials.secret_access_key | Required. This field indicates the secret access key of AWS. |
aws.credentials.session_token | Optional. The session token associated with the temporary security credentials. Using this field is not recommended as RisingWave contains long-running jobs and the token may expire. Creating a new role is preferred. |
aws.credentials.role.arn | Optional. The Amazon Resource Name (ARN) of the role to assume. |
aws.credentials.role.external_id | Optional. The external id used to authorize access to third-party resources. |
aws.msk.signer_timeout_sec | Optional. The timeout limit for loading AWS credentials of AWS MSK. |
Here is an example of creating a sink authenticated with AWS_MSK_IAM
on AWS.
Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that simplifies the setup, scaling, and management of Apache Kafka clusters, a popular open-source distributed streaming platform.
Kafka is designed to handle real-time data feeds and follows the publisher-subscriber (pub-sub) model. Kafka’s ability to handle high-volume real-time data makes it crucial for data pipelines, analytics, and event-driven architectures.
To ingest data from Amazon MSK into RisingWave, you need an operational Amazon MSK cluster and a Kafka topic established. Once set, you’ll leverage the Kafka connector in RisingWave to consume data from your MSK topic.
This guide will detail the ingesting streaming data from Amazon MSK into RisingWave.
To learn about how to set up an Amazon MSK account and create a cluster, see Getting started using Amazon MSK. For this demo, we will assume the selection of Quick create for the Cluster creation method and Provisioned for the Cluster type. The cluster creation can take about 15 minutes.
While creating your cluster, note down the following information regarding the cluster you want to connect to.
To customize the IAM policy, see IAM access control.
To learn how to create an EC2 client machine and add the security group of the client to the inbound rules of the cluster’s security group from the VPC console, see Create a client machine.
For more information regarding SASL settings, see Sign-in credentials authentication with AWS Secrets Manager.
For more information, see Creating symmetric encryption KMS keys.
<your-username>
and <your-password>
with the username and password you want to set for the cluster.AmazonMSK_
.For more information, see Sign-in credentials authentication with AWS Secrets Manager.
To find your specific command values:
users_jaas.conf
with the following contents in /home/ubuntu
.cacerts
folder into the kafka.client.truststore.jks
copy.client_sasl.properties
at /home/ubuntu
with the following contents.<broker-url>
from now on.Optional: The following command will list the topics.
Once you run the kafka-console-producer
command, you will be prompted to enter messages into the console. Each message should be entered on a new line; you can enter as many messages as you like.
After entering messages, you can close the console window or press Ctrl + C to exit the producer.
See Quick start for options on how you can run RisingWave.
To learn about the specific syntax used to consume data from a Kafka topic, see Ingest data from Kafka.
For example, the following query creates a table that consumes data from an MSK topic connected to Kafka.
Then, you can count the records for accuracy.
To create a cluster and set up an IAM role for the cluster, see Getting started using Amazon MSK.
RisingWave requires the following permissions to access MSK:
kafka-cluster:Connect
kafka-cluster:DescribeTopic
kafka-cluster:DescribeGroup
kafka-cluster:AlterGroup
kafka-cluster:ReadData
kafka-cluster:WriteData
To access MSK using IAM, you need to use the AWS_MSK_IAM
SASL mechanism. You also need to specify the following parameters.
Parameter | Notes |
---|---|
aws.region | Required. AWS service region. For example, US East (N. Virginia). |
aws.credentials.access_key_id | Required. This field indicates the access key ID of AWS. |
aws.credentials.secret_access_key | Required. This field indicates the secret access key of AWS. |
aws.credentials.session_token | Optional. The session token associated with the temporary security credentials. Using this field is not recommended as RisingWave contains long-running jobs and the token may expire. Creating a new role is preferred. |
aws.credentials.role.arn | Optional. The Amazon Resource Name (ARN) of the role to assume. |
aws.credentials.role.external_id | Optional. The external id used to authorize access to third-party resources. |
aws.msk.signer_timeout_sec | Optional. The timeout limit for loading AWS credentials of AWS MSK. |
Here is an example of creating a sink authenticated with AWS_MSK_IAM
on AWS.