Skip to main content

Helping make smarter marketing investments

5 minutes

To process 140GB of data, down from 6 hours


"Databricks makes it easy for our analysts to explore, visualize, and share data. And since we’re federating all the data with Databricks, we know we’re working with the most complete data sets available.”

— Barbara Darrow, Director of Data Management, Neustar

Creating trusted connections with customers requires engaging with them where it matters most. To that end, Neustar’s mission is to help their customers shape digital marketing strategies through data-driven insights that boost conversion. However, a high variety of messy data and a legacy analytics platform that was rigid and difficult to scale slowed Neustar’s ability to do this effectively. With Databricks, Neustar has standardized the data coming in from clients and improved collaboration across their data teams with a unified approach to analytics in the cloud. Today, Neustar is able to offer end-to-end insights that improve advertising spend at the scale and speed their customers expect.

Non-standardized data is slow to process and hard to scale

These days, there’s no shortage of places for advertisers to put their media dollars: TV, radio, Facebook, podcasts, Twitter, TikTok — the list goes on. And while it’s certainly always nice to have choices, the more we have, the more difficult it can be to make the right one.

Part of Neustar’s mission is to take the mystery out of ad spend by helping clients mine their data for insights into how best to slice and dice dollars. The challenge with client data, however, is that it’s often messy and inconsistent. Neustar’s data teams were spending too much time on the front end cleaning and standardizing data sets that ranged from super small — because they came directly from the advertiser — to massive, system-generated data dumps.

“The challenge has always been to process messy client data in a timely and effective manner, but we also needed it to be in a standardized format,” explained Barbara Darrow, Neustar’s Director of Data Management. “Without standardization, we were spending all our time rewriting code we’d previously written in order to make it usable again.”

Neustar’s inability to scale or easily process data led to the realization that they needed to move from their rigid and complex legacy platform, but this brought its own challenges: The learning curve was high, it required retraining staff in a variety of skills (e.g., Python, PySpark, SDLC processes), and it necessitated converting their entire existing codebase.

Without any existing version control or collaboration capabilities to support the multiple teams across the organization, the order for a tool that would help with their migration efforts and provide actionable insights from the same data sets was tall.

Unified and streamlined data analytics boosts productivity and insight

Databricks on AWS made the integration with Neustar’s homegrown data processing application, Meridian, a breeze. From there, the seamlessness continued as the provisioning of clusters was straightforward and building out robust automation with the REST API required little training, but resulted in simplifying data engineering and DevOps.

Meanwhile, Delta Lake addressed Neustar’s messy data issue by providing reliable, consistent data and a boost in the performance of data pipelines. The output is then easily fed into interactive notebooks for exploration and analytics, and those insights are then fed to the customer. With actionable insights served to the business through intuitive dashboards, Neustar’s analysts and business teams are able to help their clients make smarter digital advertising decisions that result in increased engagement and conversion.

“Databricks makes it easy for our analysts to explore, visualize and share data. And since we’re federating all the data with Databricks, we know we’re working with the most complete data sets available,” said Darrow.

As for collaborative working and learning, the whole team was able to use their programming language of choice within the same notebooks, boosting cross-team collaboration. With Databricks, overall team productivity has increased, allowing them to work faster and more efficiently.

The whole process, which previously took two weeks or more, now requires fewer people and less time.

Solid data processing is just the beginning

Databricks provided Neustar with the full gamut of functionality they needed to process data at scale analytics, resulting in faster development, testing, and deployment.

“One particular client required us to process 140 gigabytes of data,” said Darrow. “Because there was no way to run it on a cluster, any kind of transformation on this size data set used to take somewhere around 4–6 hours. With Databricks, we’ve reduced that down to 5–15 minutes.”

As for the future, Darrow added that Neustar plans to look into using Databricks for more advanced analytics and machine learning. “We have started our journey in the right direction, and all the downstream optimizations could result in new data-driven innovations.”

Related Content