HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
SAN FRANCISCO + VIRTUAL
Attend Live

Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest

Wednesday, June 29 @2:50 PM

Overview

Apache Spark has become Pinterest’s dominant distributed batch processing framework. As the age of Spark 3 is coming, most of Pinterest’s Spark applications still run on Spark 2, and Pinterest is migrating its Spark Platform and most production Spark jobs to Spark 3. In this talk, we’ll share how Pinterest performed the Spark 3 version migration at scale. Moving to Spark 3 is a huge version upgrade that brings many incompatibilities and major differences compared with Spark 2. We’ll first introduce the motivation of the migration, then talk about the major challenges, approaches we took, how we handled different Spark job types during the migration, how we address the incompatibilities between Spark 2 and Spark 3, like Scala version support, and how we efficiently and safely migrated our existing production Spark jobs at scale without impacting stability & SLO with the help of Auto Migration Service (AMS). We’ll then further discuss our current performance improvements, cost saving, as well as the future plans and improvements that we’ll work on.

After attending this session, you’ll have a better understanding of the challenges required to perform the Spark 2 to Spark 3 migration at scale, furthermore, you’ll be able to utilize the experiences and considerations shared in this session to move to Spark 3 for your users in a smooth and stable manner.

Type

  • Session

Format

  • Hybrid

Track

  • Data Engineering

Difficulty

  • Intermediate

Room

  •  Moscone South | Level 2 | 215

Duration

  • 35 min

Session Speakers

Headshot of Zaheen Aziz

Zaheen Aziz

Software Engineer

Pinterest

Headshot of Zirui Li

Zirui Li

Software Engineer

Pinterest

See the best of Data+AI Summit

Watch on demand