Real-time ad performance analysis
RisingWave makes it possible to do real-time ad performance analysis in a low code manner.
Overview
Ad platforms and advertisers alike want to measure the performance of their ads. They implement events that will be triggered and sent back to their servers upon certain user interactions (viewing or clicking an ad, installing an app, or making a purchase) on websites or mobile applications. Based on these events, they define various metrics to analyze ad performance from different angles.
Click-through rate (CTR) is one of the key metrics used in digital advertising to gauge the effectiveness of ads. It is calculated as the number of clicks divided by the number of impressions. Impressions are the number of times a digital ad displays on someone’s screen in an app or on a website. A high CTR means that users find the ads displayed to them useful and relevant.
In this tutorial, you will learn how to get real-time click-through rates from ad impressions and click events with RisingWave. We have set up a demo cluster specifically for this tutorial so that you can try it out with ease.
Prerequisites
- Ensure you have Docker and Docker Compose installed in your environment. Note that Docker Compose is included in Docker Desktop for Windows and macOS. If you use Docker Desktop, ensure that it is running before launching the demo cluster.
- Ensure that the PostgreSQL interactive terminal,
psql
, is installed in your environment. For detailed instructions, see Download PostgreSQL.
Step 1: Launch the demo cluster
In the demo cluster, we packaged RisingWave and a workload generator. The workload generator will start to generate random traffic and feed them into Kafka as soon as the cluster is started.
First, clone the risingwave repository to the environment.
Now navigate to the integration_tests/ad-ctr
directory and start the demo cluster from the docker compose file.
COMMAND NOT FOUND?
The default command-line syntax in Compose V2 starts with docker compose
. See details in the Docker docs.
If you’re using Compose V1, use docker-compose
instead.
Necessary RisingWave components, including compute node, metadata node, and MinIO, will be started. The workload generator will start to generate random data and feed them into Kafka topics. In this demo cluster, data of materialized views will be stored in the MinIO instance.
Step 2: Connect RisingWave to data streams
Now let’s connect to RisingWave so that we can manage data streams and perform data analysis.
We’ll treat ad impressions and ad click events as separate streams and use simplified schemas so that you can get the gist easily.
Below is the schema of ad impression events. In this schema, impression_timestamp
is the date and time when an ad is presented to a viewer, and bid_id
is the identifier of a bid request or activity for an online ad. When we calculate click-through rates, we must make sure that the impressions and the clicks are for the same bid request / activity. Otherwise, the results will be meaningless.
The schema of ad click events is like this:
For the same bid ID, impression_timestamp
should always be smaller (earlier) than click_timestamp
.
Now that we already set up these two data streams in Kafka (in JSON) using the demo cluster, we can connect to these two streams with the following SQL statements.
TIP
scan.startup.mode = 'earliest'
means the source will start streaming from the earliest entry in Kafka. Internally, RisingWave will record the consumed offset in the persistent state so that during a failure recovery, it will resume from the last consumed offset.
Now we have connected RisingWave to the streams, but RisingWave has not started to consume data yet. For data to be processed, we need to define materialized views. After a materialized view is created, RisingWave will start to consume data from the specified offset.
Step 3: Define materialized views and query the results
In this tutorial, we’ll create two materialized views, one for the standard CTR and the other for time-windowed CTR (every 5 minutes). The materialized view for the standard CTR is intended to show the calculation in a simplified way, while the time windowed CTR is intended to show the real-world CTR calculation.
Set up a materialized view for standard CTR
Let’s look at the materialized view for standard CTRs first. With this materialized view, we count the impressions and clicks separately, join them on ad_id
, and calculate CTR based on the latest impressions and clicks.
You will then be able to find out the performance of an ad by querying the materialized view you just created:
Here is an example result.
Set up a materialized view for 5-minute windowed CTR
Things will become a little complicated if we want the CTR of every 5 minutes. We need to use the “tumble” function to map every event in the stream into a 5-minute window. We’ll create a materialized view, ad_ctr_5min
, to calculate the time-windowed CTR. This view will distribute the impression events into time windows and aggregate the impression count for each ad in each time window. You can replace 5 minutes with whatever time window that works for you.
TIP
You can easily build a CTR live dashboard on top of ad_ctr_5min
. The CTR value is dynamically changing and every ad CTR in a given window can be drawn as a plot in the line chart. Eventually, we are able to analyze how CTR changes over time.
Now let’s see the results. Note that your results will be different, because the data in the streams are randomly generated by the workload generator.
You can rerun the query a couple of minutes later to see if the results are updated.
When you finish, run the following command to disconnect RisingWave.
Optional: To remove the containers and the data generated, use the following command.
Summary
In this tutorial, we have learned:
- How to join two sources.
- How to get time-windowed aggregate results by using the tumble time window function.
Was this page helpful?