Session

Near Real-Time Media Analytics with Lakeflow Spark Declarative Pipelines

Overview

Experience	In Person
Track	Data Engineering & Streaming
Industry	Enterprise Technology, Communications, Media & Entertainment
Technologies	Lakeflow, Unity Catalog, Lakebase
Skill Level	Intermediate

In digital-first media, analytics latency directly drives editorial, revenue and audience decisions. For global publishers, sub-second insight into content performance shapes how teams react to breaking news, tentpole events and monetisation windows. This session shares how Condé Nast — a global media company with 115+ years of publishing history — is re-architecting media analytics on Databricks, moving from a fragmented, high-latency stack to a near real-time lakehouse powered by Lakeflow Spark Declarative Pipelines and Lakebase.

Our legacy stack delivered analytics with a 15–30 minute delay across InfluxDB, Qlik and rigid third-party tools — costing us revenue yield on trending stories, delayed subscription conversions, duplicated business logic and blind spots between top-of-funnel content and downstream drivers like commerce and MAI. The new architecture ingests directly from web collectors into Databricks. Lakeflow Spark Declarative Pipelines unify streaming and batch, run on-the-fly aggregations and enforce consistent business logic at scale. Lakebase powers low-latency operational serving, while the lakehouse remains the system of record.

The result: end-to-end latency under 10 seconds, third-party black-box vendors phased out, and a single unified view across audience, subscription, affiliate and ad data. We will also explore extending this with the Databricks Genie API to bring LLM-powered text-to-query into real-time media workflows.

Near Real-Time Media Analytics with Lakeflow Spark Declarative Pipelines

Overview

Session Speakers

Prasannaa Subramanian

arun karthik