SESSION

Stranger Triumphs: Automating Spark Upgrades & Migrations at Netflix

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Engineering and Streaming
INDUSTRYEnterprise Technology, Media and Entertainment
TECHNOLOGIESApache Spark, ETL, Orchestration
SKILL LEVELIntermediate
DURATION40

With Apache Spark™ 4 in the pipeline for this year, many of us are looking at what will be involved in upgrading to the latest and greatest Spark – not to mention the ever-evolving world of AI libraries. This talk examines how Netflix has automated large parts of our upgrade and how you can use these techniques for your data platform. We will share:

 

  • Our cool open source tools that rewrite Spark code
  • Our tools for testing Spark jobs in production
  • How we track the state of jobs
  • Re-using those same tools to migrate to a containerized environment
  • User experiences

 

In this session, you will learn how to: upgrade your Spark pipelines without crying and validate Spark pipelines even when you don't trust the tests (by extending the write-audit-publish pattern). This talk is ideal for Data scientists, ML engineers, and anyone who's inherited legacy data products platform engineers managing Spark infrastructure.

SESSION SPEAKERS

Holden Karau

/Engineer
Netflix / Totally Legit Co

Robert Morck

/Software engineer
Netflix