ホームData + AI Summit 2022 のロゴ
Watch on demand

Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest

Wednesday, June 29 @2:50 PM

概要

Apache Spark has become Pinterest’s dominant distributed batch processing framework. As the age of Spark 3 is coming, most of Pinterest’s Spark applications still run on Spark 2, and Pinterest is migrating its Spark Platform and most production Spark jobs to Spark 3. In this talk, we’ll share how Pinterest performed the Spark 3 version migration at scale. Moving to Spark 3 is a huge version upgrade that brings many incompatibilities and major differences compared with Spark 2. We’ll first introduce the motivation of the migration, then talk about the major challenges, approaches we took, how we handled different Spark job types during the migration, how we address the incompatibilities between Spark 2 and Spark 3, like Scala version support, and how we efficiently and safely migrated our existing production Spark jobs at scale without impacting stability & SLO with the help of Auto Migration Service (AMS). We’ll then further discuss our current performance improvements, cost saving, as well as the future plans and improvements that we’ll work on.

After attending this session, you’ll have a better understanding of the challenges required to perform the Spark 2 to Spark 3 migration at scale, furthermore, you’ll be able to utilize the experiences and considerations shared in this session to move to Spark 3 for your users in a smooth and stable manner.

Type

  • Session

フォーマット

  • Hybrid

Track

  • データエンジニアリング

Difficulty

  • Intermediate

Room

  •  Moscone South | Level 2 | 215

Duration

  • 35 min

Session Speakers

Zaheen Aziz

ソフトウェアエンジニア

Pinterest

Zirui Li

ソフトウェアエンジニア

Pinterest

Data+AI サミットの様子をご覧いただけます

Watch on demand