Saving Millions From Millions: Navigating Towards Cost-Efficiency in Pinterest's Spark Jobs
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Enterprise Technology, Professional Services, Retail and CPG - Food |
Technologies | Apache Spark |
Skill Level | Intermediate |
Duration | 40 min |
While Spark offers powerful processing capabilities for massive data volumes, cost-efficiency challenges are always bothering users operating at large scales. At Pinterest, where we run millions of Spark jobs monthly, maintaining infra cost efficiency is crucial to support our rapid business growth.
To tackle this challenge, we have developed several strategies that have saved us tens of millions of dollars across numerous job instances. We will share our analytical methodology for identifying performance bottlenecks, and the technical solutions to overcome various challenges. Our approach includes extracting insights from billions of collected metrics, leveraging remote shuffle services to address shuffle slowness and improve memory utilization and reduce costs while hosting hundreds of millions of pods.
The presentation aims to trigger more discussions about cost efficiency topics of Apache Spark in the community and help the community to tackle the common challenge.
Session Speakers
IMAGE COMING SOON
Nan Zhu
/Staff Software Engineer
Pinterest