Data Transparency in Ad Tech | Blue Media Services

Introduction

Event Pipe: Real-time event orchestration with filtering, enrichment, and routing.
Event Store: Structured, scalable, and flexible event storage in CloudEvents format.
Full transparency: Empowering clients to detect fraud, optimize campaigns, and own their insights.

In the Ad Tech industry, where each interaction generates a continuous flow of events, data transparency is a powerful yet underexplored strategic tool.

In a scenario where privacy and data ownership are increasingly valued, solutions that efficiently structure and store events provide clients not only with greater clarity in tracking but also the ability to accurately identify trends and opportunities. This approach ensures a continuous flow of information, where every click, view, or conversion is stored, processed, and transformed into strategic data, enabling more informed and knowledge-based decisions.

By combining event pipes – which orchestrate and direct events in real time to different destinations, such as a webhook, Google Spreadsheet, or event stores – with systems responsible for storing events in a structured way, it is possible to transform raw data into powerful insights. This offers the client a level of visibility that redefines control over their operations.

In this article, we explore how our approach, based on event pipes and event stores, empowers clients to harness the full potential of their data, ensuring not only transparency but also valuable insights that drive results and generate a competitive advantage.

Challenge We Faced

The Marketing and Ad Tech sectors generate a massive volume of data daily, and transparency in this data treatment is an essential demand from customers and advertisers. Many players that make this data available limit customer analysis to their own platform and still charge dearly for it, which shows the difficulties they face in implementing processes that ensure the reliability and clarity of the numbers presented. Some of the industry’s main problems include:

Inaccurate measurement and attribution

The difficulty in accurately measuring campaign metrics, such as impressions, clicks and conversions, directly impacts results. Many advertising agencies limit themselves to presenting raw numbers, such as 100 impressions, 80 views and 20 clicks on an ad.

However, this isolated data does not provide enough context to answer essential questions, such as: When did these interactions occur? What is the location of these impressions, views and clicks? How long did it take from the impression to the user’s click? Without these answers, advertisers lose valuable insights into the effectiveness of their campaigns. Empowering clients with detailed data not only improves transparency but also strengthens decision-making based on reliable metrics.

Given this challenge, our platform adopts an approach based on enriched data and contextual analysis, allowing advertisers to understand not only the quantity but also the quality and behavior behind the interactions. With the implementation of advanced technologies and intelligent tracking mechanisms, we are able to provide detailed metrics about the time, location and response time between each stage of the user journey. This transparency not only improves the evaluation of campaign performance but also allows for more precise optimizations, maximizing impact and return on investment.

Challenges in data transparency and click reliability

The lack of clarity in data collection and presentation can create distrust among advertisers, jeopardizing long-term partnerships. When there is no detailed access to metrics or explanations about their origin, the numbers presented may seem arbitrary, reducing the platform’s credibility.

In the AdTech market, advertising fraud such as fraudulent clicks and bot-generated traffic are constant challenges that impact the reliability of metrics. Automated networks can artificially inflate engagement numbers, making it difficult to differentiate between genuine interactions and malicious activities. Additionally, there are cases of click fraud bundling, where multiple suspicious events are grouped together and masked to appear legitimate, compromising measurement accuracy.

In light of this scenario, our platform employs advanced validation and monitoring mechanisms to ensure that the data presented to clients reflects real and reliable interactions. By implementing specialized technologies and robust filters, we are able to identify suspicious patterns and mitigate fraud, preserving the integrity of advertising campaigns and optimizing return on investment for advertisers.

Another side effect of the lack of reliable data in the industry is inefficiency in campaign planning. Without a clear view of the target audience and access to accurate data, marketing professionals will optimize campaigns with incomplete or incorrect information, resulting in wasted budgets and reduced impact on media strategies. Our transparency- and accuracy-focused approach enables advertisers to make more assertive decisions and maximize their results.

How We Solved It

To empower our clients to use their own data, we adopted an Event-Driven Design approach, structuring two main systems: Event Pipe and Event Store. These systems were designed to provide clients with flexibility and control over their events and information.

Technologies

We chose to use AWS services to develop our solution, as a significant portion of our system already operates on this infrastructure. By keeping AWS as our provider, we were able to simplify integration and ensure greater operational consistency.

Among the chosen technologies, we used:

SNS with SQS: For more sensitive events, we employed SNS (Simple Notification Service) with SQS (Simple Queue Service), ensuring that messages were processed reliably. In the case of a widespread failure, events are routed to a DLQ (Dead Letter Queue), allowing for later reprocessing.
Kafka (AWS MSK): For less sensitive events, we used Kafka with AWS’s managed service, MSK (Managed Streaming for Apache Kafka).
Kafka Connect: To integrate the event flow from Kafka with other systems, we used Kafka Connect, enabling greater flexibility in data routing and processing.
KSQLDB: A streaming database developed by Confluent to operate with Apache Kafka. KSQLDB allows us to create, process, and query data in real-time using SQL, filtering and forwarding only relevant events for processing and storage for later analysis by clients.
Amazon S3: To store events in a scalable and accessible manner, we used Amazon S3, where data is persisted in JSONL (JSON Lines) format, ensuring ease of querying and later analysis.
Lambda: We developed code capable of processing batches of events from both Kafka and SQS, redirecting them to the destination configured by the user on the platform.

These were the technologies used for the development of the Event Pipe and Event Store systems. Now, finally, we will discuss the systems in more technical detail.

Event Pipe: The Event Routing System

The Event Pipe is responsible for orchestrating the flow of data between source systems and configurable destinations, ensuring that each client receives only the events relevant to their context. To achieve this, it includes an Event Pipe management system, which allows clients to configure their own event pipelines, defining filters and choosing the destinations where events will be forwarded. This management system is just one part of the Event Pipe, serving as the interface where clients can customize how their events are handled.

The Event Pipe was built focusing on three main stages:

Receive Events

The Event Pipe consumes data from multiple heterogeneous sources, including:

Kafka: We use AWS’s MSK to manage multiple Kafka topics.

SNS/SQS: For events published directly to Amazon SNS topics and processed via SQS.

Filter and Enrich Events

Each received event goes through a filtering process based on rules defined by clients. These rules can be configured to select events based on any data present in the event body, ensuring greater flexibility and control over the information flow.

Forward Events

After processing, the events are routed to their destinations via Kafka and can be sent to:

Event Store
Webhook
Google Spreadsheet

As illustrated in the diagram above:

KSQLDB for event routing: We use KSQLDB to group events from multiple Kafka topics and forward them to a single topic, called RouteEventCommand. This topic centralizes all events belonging to our clients and emitted by Kafka, facilitating their subsequent processing.
Integration with SQS: For events published to SNS topics, we use a queue in SQS, called RouteEventsQueue. This queue receives the events and transforms them into messages, ensuring that they can be processed later by the Lambda responsible for routing.
Routing with EventRouter: The EventRouter, our AWS Lambda, is responsible for forwarding the events to their destinations, as configured previously by users in the Event Pipe management system.

Event Store: The Raw and Flexible Event Storage

The Event Store is a data storage location for clients who wish to retain events for future analysis or integration with other systems. It was designed to provide:

Standardized Storage in CNCF Format

All stored events follow the CloudEvents standard, widely used in the industry, ensuring interoperability and support for external tools.
We chose to store events in JSONL (JSON Lines) format, which facilitates batch processing and allows for export to CSV when needed.

Ease of Integration with Data Lakes

We included support for automatically transferring events stored in the Event Store to the client’s data lake if they wish.

Flexibility for Queries and Downloads

We implemented an API that allows clients to search, filter, and download their stored data. The options include:
Complete download of JSONL (JSON Lines) or CSV files.
Dynamic queries based on parameters such as type, origin, or time period.

In the following diagram, we will have a more technical view of how this solution was built:

In this section, we have a Kafka topic called SendToEventStoreCommand, which is responsible for receiving events that should be stored in an Event Store. To process these events, we developed a Kafka Connector that groups them by account and type over 1 hour. After this interval, the events are consolidated into a file and stored in Amazon S3.
Once in S3, these files need to be mapped in our database so that clients can manage them. To achieve this, we use EventBridge, which monitors S3 events. For every creation, update, or deletion of a file, the Lambda EventStreamEntityUpdater is triggered to synchronize the data in RDS. We call these records in our database Event Streams, meaning that within an Event Store, we store Event Streams.

Given this process, some questions arise: are these Event Streams stored indefinitely? Is there an expiration time for these events? What is the real value of very old events for our customers?

With this in mind, we thought of a strategy to automatically remove Event Streams, according to the settings defined by the user, which we will detail below:

To remove very old events that have possibly lost their value, we developed the solution as illustrated in the diagram above. In the Event Stores management system, the user can configure a TTL (Time To Live) for the stored events. This way, we only remove the files that the user considers expired within their business context.

We can see in the diagram:

With TTL configured and events being stored, a Lambda called EventStreamScanner periodically monitors the Event Streams of each Event Store, identifying those that have expired. When it finds expired events, this Lambda publishes a command to the CleanExpiredEventStreamCommand topic, which, in turn, triggers an SQS queue responsible for calling the Lambda tasked with removal.
When the command is processed, the Lambda ExpiredEventStreamsCleaner is executed and permanently removes the file from Amazon S3. After deletion, as described in the previous diagram, the S3 DeleteObject event triggers the Lambda EventStreamEntityUpdater, ensuring synchronization of the RDS database.

Thus, our solution combines the principles of Event-Driven Design with a robust ecosystem of AWS technologies to provide clients with full control over their events. The Event Pipe ensures an efficient flow of events between multiple sources and configurable destinations, while the Event Store offers a reliable repository for storing historical events in a structured manner. Additionally, the implementation of an automated event expiration system ensures that stored data remains relevant, preventing unnecessary accumulation and optimizing resources. With this approach, we deliver a scalable, flexible solution aligned with our clients’ business needs.

What We Learned

Throughout the development and refinement of our systems, we identified challenges and opportunities that shaped our technical decisions. Scalability, standardization, and end-user experience were critical points that guided our choices. Below, we highlight some of the key lessons learned and the solutions adopted to optimize our processes:

Scalability in Event Processing

The volume of events our systems need to process can vary drastically depending on each client’s load. To handle traffic spikes, we adopted:

Lambda: A serverless component that scales horizontally to meet demand.
SQS/SNS: A message queue that decouples event ingestion from processing, ensuring greater resilience and control over data flow.

Standards as Adoption Facilitators

The decision to adopt the CNCF standard was strategic. This simplified the integration of external tools and reduced the effort for clients to consume the stored data. It also enabled the reuse of events in multiple contexts, such as internal analysis, BI integrations, and data science.

Reduction of Complexity for the End User

By allowing clients to configure custom pipelines directly via the interface, we simplified a process that previously required developers to write code or configure systems. This automation was achieved with well-designed APIs and an efficient metadata system.

Combining modern technologies such as Kafka, SNS, SQS, KSQLDB and AWS Lambda, we were able to build a scalable, flexible and customer-centric system. The result was an event pipeline that puts control in the hands of clients, allowing them to extract real value from their own data, without technical friction.

Written by: Bruno Farias, Vanderson Arruda, Vitor Lourenço