Migrating and Optimizing Large-Scale Streaming Applications with Databricks (repeat)


TRACKData Engineering and Streaming
INDUSTRYMedia and Entertainment
TECHNOLOGIESApache Spark, Developer Experience, Orchestration
SKILL LEVELIntermediate

This session is repeated.


Our large-scale streaming application processes hundreds of billions of ad events daily at over 5GB/s. It transforms, joins, and routes these ad events to hundreds of heterogeneous destinations, enabling real-time analytics, batch reporting, ML-based forecasting, and streaming ad log delivery for programmatic ad campaigns. In this session, we will discuss how we rearchitected, redeveloped, and migrated this massive application with over 30K lines of code to a Databricks Spark Structured Streaming architecture. We'll share lessons learned, cover the substantial benefits gained, and detail how we enhanced performance through various memory-related optimizations, Kinesis parameter tuning, parallelizing the output stage within each micro-batch, and other tweaks. We'll introduce FreeWheel, programmatic advertising, the architecture of the larger data platform that incorporates this streaming application, and our robust monitoring and observability solution. Finally, we'll highlight several Databricks features that enhanced our development experience, such as the Databricks AI assistant.


Sharif Doghmi

/Lead Software Engineer
FreeWheel, A Comcast Company

Donghui Li

/Lead Software Engineer
FreeWheel, A Comcast Company