Near Real-Time Netflix Recommendations Using Apache Spark Streaming

Download Slides

As a data driven company, we use Machine learning based algos and A/B tests to drive all of the content recommendations for our members. Traditionally, these recommendations are precomputed in a batch processing fashion, but such a model cannot react quickly based on member interactions, title interests, popularity etc. With an ever-growing Netflix catalog, finding the right content for our audience in near real-time would provide the best personalized experience.

We’ll take a deep dive into our realtime Spark Streaming ecosystem at Netflix. Both it’s infrastructure and business use cases. On the infrastructure front, we will delve into scale challenges, state management, data persistence, resiliency considerations, metrics, operations and auto-remediation. We will talk about a few use cases that leverage real-time data for model training, such as providing the right personalized videos in a member’s Billboard and choosing the right personalized image soon after the launch of the show. We will also reflect on the lessons learnt while building such high volume infrastructure.

Session hashtag: #ML7SAIS

« back
About Nitin Sharma

Nitin is a Senior Software Engineer on the Personalization Infrastructure team at Netflix. His primary focus is on building various ML infrastructure components using Apache Spark that helps Netflix research engineers to innovate faster and improve personalized recommendations. He is passionate about Large Scale Distributed Systems, Search Platforms and Performance Optimizations. He is an active open source contributor for Apache Solr and a few other apache projects.

About Elliot Chow

Elliot is a software engineer at Netflix on the Personalization Infrastructure team. He graduated from UC Berkeley (B.S.) and Stanford (M.S.) and has previously worked at eBay and Apple. Currently, he builds big data systems using a variety of technologies including Scala, Spark (Streaming), Kafka, and Cassandra.