Skip to main content

Serving enhanced video advertisement insights to clients

MediaRadar | Vivvix automates ad classification with Databricks Mosaic AI

2 days to 4 hours

Time saved experimenting with Mosaic AI Playground


Increase in ads categorized per hour

SOLUTION: Model Serving
PLATFORM USE CASE: Generative AI,Data Streaming
CLOUD: Azure

“Databricks supercharged our data operations by providing high-speed processing power with robust scalability, seamless integration and second-to-none technical support. Leveraging Databricks Mosaic AI tools has provided us strategic insights with remarkable speed and precision.”

— Abhinav Kothari, CTO, MediaRadar | Vivvix

MediaRadar | Vivvix provides comprehensive advertising intelligence that empowers brands to understand and optimize their ad spending across all media channels. Initially, MediaRadar | Vivvix faced challenges with manual data processing and fragmented workflows that made it harder to support a rapid growth in ad volumes. By adopting Databricks Mosaic AI and Apache Spark™ Structured Streaming, they revolutionized their operations, automating ad classification and streamlining data management through a unified platform. This transformation enabled MediaRadar | Vivvix to process thousands of ads per hour, significantly improving the accuracy and efficiency of insights gathered for customers, and paving the way for innovations like real-time data processing and advanced model training. The Databricks Data Intelligence Platform has transformed MediaRadar | Vivvix’s ability to handle massive data volumes and deliver precise insights at scale.

Manual data processing impedes valuable ad insights for clients

MediaRadar | Vivvix, an advertising intelligence company, helps brand marketers and advertising agencies understand their competitive landscape and the impact of their media spends. Their primary mission is to discern “what, where and when” for their clients, which involves identifying the brand, products and even notable details like featured celebrities in ads.

Prior to Databricks, MediaRadar | Vivvix faced bottlenecks with their previous setup using Amazon Simple Queue Service (SQS), where manual polling for data and a limit of 10 messages at a time severely hindered the company’s ability to meet service level agreements (SLAs). The nature of their task — classifying over 6 million unique products — posed “an extreme classification problem,” as Dong-Hwi Kim, MediaRadar | Vivvix’s Senior Machine Learning (ML) Engineer, put it. Existing ML models were inadequate due to the sheer scale and diversity of the products in their database. Kim pointed out, “We created a fine-tuned ML model in-house, but even that wasn’t sufficient enough. We didn’t have training data to support the millions of products we had.”

The company’s ad classification process relied on manual labor. Hundreds of operators were tasked with watching ads and recording detailed information on their elements, which was incredibly time-consuming and could not keep pace with the rapid increase in volume. “For the past couple of years, the overall ad spend by companies has almost doubled. To scale alongside this, it’s not really feasible to increase the number of people to match,” Kim noted. This necessitated an automated solution, leveraging AI and machine learning to handle real-time ad classification more efficiently.

Thierry Steenberghs, Principal Software Engineer at MediaRadar | Vivvix, highlighted the difficulties of managing the data and ensuring the smooth operation of their models. “One of the nightmares that we were having was the fragmented structure. We had a bunch of pods that were running different components, and then we needed to log in to Azure to see what was happening. It was very hard to see if everything was working properly,” Steenberghs explained. “We got the data, then we had to build a model, export it and then import it. It was a lot of wasted time.”

With the adoption of Spark Structured Streaming, data processing is now automated, enabling continuous and real-time data ingestion without manual intervention. This shift not only eliminated concerns about meeting SLAs but also allowed MediaRadar | Vivvix to efficiently process thousands of ads per hour. The result is a streamlined workflow that gives MediaRadar | Vivvix the confidence that they will deliver precise and timely insights to customers ranging from small brands to major industry leaders.

Accelerating video processing and ad classification with GenAI

Databricks Mosaic AI and Spark Structured Streaming played a crucial role in automating the ad classification process, significantly enhancing efficiency and scalability. The MediaRadar | Vivvix team leveraged the Ray cluster on Databricks to optimize video processing and scale the classification of video ads across millions of categories. This innovative, dual-layer approach involved using GenAI to identify products in ads and then comparing the results with MediaRadar | Vivvix’s own classification models to select the best matches, ensuring higher accuracy in identifying the correct products. The Databricks Platform was used again to select the best match from the combined predictions, refining the results and improving overall accuracy.

The integration of Databricks streamlined MediaRadar | Vivvix’s data processing workflow, making it more efficient and scalable. Data from video ads was ingested into the Databricks Data Intelligence Platform for seamless integration and preprocessing. “By moving everything into Databricks, we don’t have silos. We can monitor the data as it gets in from all our different sources, including from vendors,” Steenberghs said, a far cry from the “nightmare” he previously experienced. “We can monitor the data transformation and we can see the training. We can even measure the performance of the models. And it’s all right there in one platform.”

Steenberghs’ team developed preprocessing pipelines that included fingerprinting to identify duplicates, transcription and translation using models like Whisper and optical character recognition (OCR) to extract textual information from the ads. These preprocessing steps were crucial for preparing the data for classification and keeping their large database as clean as possible.

Initial experimentation was conducted using the Mosaic AI Model Serving environment. This facilitated rapid prototyping and testing of various machine learning models. To manage costs effectively, the team chose to use OpenAI’s GPT-3.5 model, balancing performance and expense. This decision allowed them to process thousands of creative assets daily without incurring prohibitive costs.

Serving clients better with a 150% increase in hourly throughput

The integration of Databricks brought numerous benefits to MediaRadar | Vivvix, transforming their operations and enabling them to meet their goals more effectively. One of the key advantages was the significant improvement in processing speed and scalability. Steenberghs illustrated this by comparing their performance before and after Databricks. “Before, we were able to do at most around 800 creatives an hour. With our move to Databricks, we’re classifying about 2,000 an hour.” This remarkable increase in throughput underscored the platform’s ability to handle large volumes of data efficiently.

Another significant benefit was the reduction in manual intervention and the associated administrative overhead. With Databricks, MediaRadar | Vivvix can automate many tasks that previously required extensive human involvement. “It saves me time with model experimentation and testing. What would normally take two days’ worth of work now takes maybe half a day,” Kim remarked.

In addition, Databricks enabled MediaRadar | Vivvix to implement a more agile and adaptable development process. Steenberghs highlighted the platform’s flexibility and transparency: “I love the agility. I love that Databricks is very transparent. I love the fact that I can mix SQL and Python as I need to.” This adaptability allowed his team to continuously refine their models and stay ahead in the fast-evolving field of ad classification.

Steenberghs also expressed excitement about the implementation of Databricks Unity Catalog, which is expected to enhance their data management and security — for them and for their customers. “We are moving to Unity Catalog, which will help us stay secure,” Steenberghs said. “It’s currently hard for us to enforce security, so it will be nice to still have open access while being able to restrict who can see what.”

Overall, the partnership with Databricks provided MediaRadar | Vivvix with a powerful, scalable solution that not only addressed their immediate challenges but also positioned them for future growth and innovation. The implementation of GenAI exemplifies their forward-thinking approach, enabling MediaRadar | Vivvix to serve ad insights faster and with more accuracy.

About MediaRadar | Vivvix

To learn more about how MediaRadar | Vivvix’s Ad Intelligence Platform gives customers a greater understanding of the competitive media landscape to drive more effective decisions, visit