Spark Right-Sizing: The Secret to Saving Millions of Dollars at LinkedIn
Overview
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Data Warehousing |
Industry | Enterprise Technology, Media and Entertainment |
Technologies | Apache Spark |
Skill Level | Intermediate |
Duration | 20 min |
At LinkedIn, we manage over 400,000 daily Spark applications consuming 200+ PBHrs of compute daily. To address the challenges posed by manual configuration of Spark's memory tuning options, which led to low memory utilization and frequent OOM errors, we developed an automated Spark executor memory right-sizing system. Our approach, utilizing a policy-based system with nearline and real-time feedback loops, automates memory tuning, leading to more efficient resource allocation, improved user productivity and increased job reliability. By leveraging historical data and real-time error classification, we dynamically adjust memory, significantly narrowing the gap between allocated and utilized resources while reducing failures. This initiative has achieved a 13% increase in memory utilization and a 90% drop in OOM-related job failures, saving us 1000s of PBHrs of compute every year.
Session Speakers
Shreyesh Arangath
/Senior Software Engineer
LinkedIn