RisingWave makes it possible to do real-time ad performance analysis in a low code manner.
Ad platforms and advertisers alike want to measure the performance of their ads. They implement events that will be triggered and sent back to their servers upon certain user interactions (viewing or clicking an ad, installing an app, or making a purchase) on websites or mobile applications. Based on these events, they define various metrics to analyze ad performance from different angles.
Click-through rate (CTR) is one of the key metrics used in digital advertising to gauge the effectiveness of ads. It is calculated as the number of clicks divided by the number of impressions. Impressions are the number of times a digital ad displays on someone’s screen in an app or on a website. A high CTR means that users find the ads displayed to them useful and relevant.
In this tutorial, you will learn how to get real-time click-through rates from ad impressions and click events with RisingWave. We have set up a demo cluster specifically for this tutorial so that you can try it out with ease.
psql
, is installed in your environment. For detailed instructions, see Download PostgreSQL.In the demo cluster, we packaged RisingWave and a workload generator. The workload generator will start to generate random traffic and feed them into Kafka as soon as the cluster is started.
First, clone the risingwave repository to the environment.
Now navigate to the integration_tests/ad-ctr
directory and start the demo cluster from the docker compose file.
COMMAND NOT FOUND?
The default command-line syntax in Compose V2 starts with docker compose
. See details in the Docker docs.
If you’re using Compose V1, use docker-compose
instead.
Necessary RisingWave components, including compute node, metadata node, and MinIO, will be started. The workload generator will start to generate random data and feed them into Kafka topics. In this demo cluster, data of materialized views will be stored in the MinIO instance.
Now let’s connect to RisingWave so that we can manage data streams and perform data analysis.
We’ll treat ad impressions and ad click events as separate streams and use simplified schemas so that you can get the gist easily.
Below is the schema of ad impression events. In this schema, impression_timestamp
is the date and time when an ad is presented to a viewer, and bid_id
is the identifier of a bid request or activity for an online ad. When we calculate click-through rates, we must make sure that the impressions and the clicks are for the same bid request / activity. Otherwise, the results will be meaningless.
The schema of ad click events is like this:
For the same bid ID, impression_timestamp
should always be smaller (earlier) than click_timestamp
.
Now that we already set up these two data streams in Kafka (in JSON) using the demo cluster, we can connect to these two streams with the following SQL statements.
scan.startup.mode = 'earliest'
means the source will start streaming from the earliest entry in Kafka. Internally, RisingWave will record the consumed offset in the persistent state so that during a failure recovery, it will resume from the last consumed offset.
Now we have connected RisingWave to the streams, but RisingWave has not started to consume data yet. For data to be processed, we need to define materialized views. After a materialized view is created, RisingWave will start to consume data from the specified offset.
In this tutorial, we’ll create two materialized views, one for the standard CTR and the other for time-windowed CTR (every 5 minutes). The materialized view for the standard CTR is intended to show the calculation in a simplified way, while the time windowed CTR is intended to show the real-world CTR calculation.
Let’s look at the materialized view for standard CTRs first. With this materialized view, we count the impressions and clicks separately, join them on ad_id
, and calculate CTR based on the latest impressions and clicks.
You will then be able to find out the performance of an ad by querying the materialized view you just created:
Here is an example result.
Things will become a little complicated if we want the CTR of every 5 minutes. We need to use the “tumble” function to map every event in the stream into a 5-minute window. We’ll create a materialized view, ad_ctr_5min
, to calculate the time-windowed CTR. This view will distribute the impression events into time windows and aggregate the impression count for each ad in each time window. You can replace 5 minutes with whatever time window that works for you.
You can easily build a CTR live dashboard on top of ad_ctr_5min
. The CTR value is dynamically changing and every ad CTR in a given window can be drawn as a plot in the line chart. Eventually, we are able to analyze how CTR changes over time.
Now let’s see the results. Note that your results will be different, because the data in the streams are randomly generated by the workload generator.
You can rerun the query a couple of minutes later to see if the results are updated.
When you finish, run the following command to disconnect RisingWave.
Optional: To remove the containers and the data generated, use the following command.
In this tutorial, we have learned:
RisingWave makes it possible to do real-time ad performance analysis in a low code manner.
Ad platforms and advertisers alike want to measure the performance of their ads. They implement events that will be triggered and sent back to their servers upon certain user interactions (viewing or clicking an ad, installing an app, or making a purchase) on websites or mobile applications. Based on these events, they define various metrics to analyze ad performance from different angles.
Click-through rate (CTR) is one of the key metrics used in digital advertising to gauge the effectiveness of ads. It is calculated as the number of clicks divided by the number of impressions. Impressions are the number of times a digital ad displays on someone’s screen in an app or on a website. A high CTR means that users find the ads displayed to them useful and relevant.
In this tutorial, you will learn how to get real-time click-through rates from ad impressions and click events with RisingWave. We have set up a demo cluster specifically for this tutorial so that you can try it out with ease.
psql
, is installed in your environment. For detailed instructions, see Download PostgreSQL.In the demo cluster, we packaged RisingWave and a workload generator. The workload generator will start to generate random traffic and feed them into Kafka as soon as the cluster is started.
First, clone the risingwave repository to the environment.
Now navigate to the integration_tests/ad-ctr
directory and start the demo cluster from the docker compose file.
COMMAND NOT FOUND?
The default command-line syntax in Compose V2 starts with docker compose
. See details in the Docker docs.
If you’re using Compose V1, use docker-compose
instead.
Necessary RisingWave components, including compute node, metadata node, and MinIO, will be started. The workload generator will start to generate random data and feed them into Kafka topics. In this demo cluster, data of materialized views will be stored in the MinIO instance.
Now let’s connect to RisingWave so that we can manage data streams and perform data analysis.
We’ll treat ad impressions and ad click events as separate streams and use simplified schemas so that you can get the gist easily.
Below is the schema of ad impression events. In this schema, impression_timestamp
is the date and time when an ad is presented to a viewer, and bid_id
is the identifier of a bid request or activity for an online ad. When we calculate click-through rates, we must make sure that the impressions and the clicks are for the same bid request / activity. Otherwise, the results will be meaningless.
The schema of ad click events is like this:
For the same bid ID, impression_timestamp
should always be smaller (earlier) than click_timestamp
.
Now that we already set up these two data streams in Kafka (in JSON) using the demo cluster, we can connect to these two streams with the following SQL statements.
scan.startup.mode = 'earliest'
means the source will start streaming from the earliest entry in Kafka. Internally, RisingWave will record the consumed offset in the persistent state so that during a failure recovery, it will resume from the last consumed offset.
Now we have connected RisingWave to the streams, but RisingWave has not started to consume data yet. For data to be processed, we need to define materialized views. After a materialized view is created, RisingWave will start to consume data from the specified offset.
In this tutorial, we’ll create two materialized views, one for the standard CTR and the other for time-windowed CTR (every 5 minutes). The materialized view for the standard CTR is intended to show the calculation in a simplified way, while the time windowed CTR is intended to show the real-world CTR calculation.
Let’s look at the materialized view for standard CTRs first. With this materialized view, we count the impressions and clicks separately, join them on ad_id
, and calculate CTR based on the latest impressions and clicks.
You will then be able to find out the performance of an ad by querying the materialized view you just created:
Here is an example result.
Things will become a little complicated if we want the CTR of every 5 minutes. We need to use the “tumble” function to map every event in the stream into a 5-minute window. We’ll create a materialized view, ad_ctr_5min
, to calculate the time-windowed CTR. This view will distribute the impression events into time windows and aggregate the impression count for each ad in each time window. You can replace 5 minutes with whatever time window that works for you.
You can easily build a CTR live dashboard on top of ad_ctr_5min
. The CTR value is dynamically changing and every ad CTR in a given window can be drawn as a plot in the line chart. Eventually, we are able to analyze how CTR changes over time.
Now let’s see the results. Note that your results will be different, because the data in the streams are randomly generated by the workload generator.
You can rerun the query a couple of minutes later to see if the results are updated.
When you finish, run the following command to disconnect RisingWave.
Optional: To remove the containers and the data generated, use the following command.
In this tutorial, we have learned: