Near Real-Time Media Analytics with Lakeflow Spark Declarative Pipelines
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Enterprise Technology, Communications, Media & Entertainment |
| Technologies | Lakeflow, Unity Catalog, Lakebase |
| Skill Level | Intermediate |
In digital-first media, analytics latency directly drives editorial, revenue and audience decisions. For global publishers, sub-second insight into content performance shapes how teams react to breaking news, tentpole events and monetisation windows. This session shares how Condé Nast — a global media company with 115+ years of publishing history — is re-architecting media analytics on Databricks, moving from a fragmented, high-latency stack to a near real-time lakehouse powered by Lakeflow Spark Declarative Pipelines and Lakebase.
Our legacy stack delivered analytics with a 15–30 minute delay across InfluxDB, Qlik and rigid third-party tools — costing us revenue yield on trending stories, delayed subscription conversions, duplicated business logic and blind spots between top-of-funnel content and downstream drivers like commerce and MAI. The new architecture ingests directly from web collectors into Databricks. Lakeflow Spark Declarative Pipelines unify streaming and batch, run on-the-fly aggregations and enforce consistent business logic at scale. Lakebase powers low-latency operational serving, while the lakehouse remains the system of record.
The result: end-to-end latency under 10 seconds, third-party black-box vendors phased out, and a single unified view across audience, subscription, affiliate and ad data. We will also explore extending this with the Databricks Genie API to bring LLM-powered text-to-query into real-time media workflows.
Session Speakers
Jitendra Sharma
/Senior Data Engineering Manager
Conde Nast Publications
Arun Karthik
/Director, Data Solutions Engineering
CondéNast