홈페이지Data + AI Summit 2022 로고
Watch on demand

Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest

Wednesday, June 29 @2:50 PM

개요

Apache Spark has become Pinterest’s dominant distributed batch processing framework. As the age of Spark 3 is coming, most of Pinterest’s Spark applications still run on Spark 2, and Pinterest is migrating its Spark Platform and most production Spark jobs to Spark 3. In this talk, we’ll share how Pinterest performed the Spark 3 version migration at scale. Moving to Spark 3 is a huge version upgrade that brings many incompatibilities and major differences compared with Spark 2. We’ll first introduce the motivation of the migration, then talk about the major challenges, approaches we took, how we handled different Spark job types during the migration, how we address the incompatibilities between Spark 2 and Spark 3, like Scala version support, and how we efficiently and safely migrated our existing production Spark jobs at scale without impacting stability & SLO with the help of Auto Migration Service (AMS). We’ll then further discuss our current performance improvements, cost saving, as well as the future plans and improvements that we’ll work on.

After attending this session, you’ll have a better understanding of the challenges required to perform the Spark 2 to Spark 3 migration at scale, furthermore, you’ll be able to utilize the experiences and considerations shared in this session to move to Spark 3 for your users in a smooth and stable manner.

Type

  • Session

Format

  • Hybrid

Track

  • 데이터 엔지니어링

Difficulty

  • Intermediate

Room

  •  Moscone South | Level 2 | 215

Duration

  • 35 min

Session Speakers

Headshot of Zaheen Aziz

Zaheen Aziz

소프트웨어 엔지니어

Pinterest

Headshot of Zirui Li

Zirui Li

소프트웨어 엔지니어

Pinterest

Data+AI Summit 하이라이트 보기

Watch on demand