Session

Spark Right-Sizing: The Secret to Saving Millions of Dollars at LinkedIn

Overview

ExperienceIn Person
TypeLightning Talk
TrackData Warehousing
IndustryEnterprise Technology, Media and Entertainment
TechnologiesApache Spark
Skill LevelIntermediate
Duration20 min

At LinkedIn, we manage over 400,000 daily Spark applications consuming 200+ PBHrs of compute daily. To address the challenges posed by manual configuration of Spark's memory tuning options, which led to low memory utilization and frequent OOM errors, we developed an automated Spark executor memory right-sizing system. Our approach, utilizing a policy-based system with nearline and real-time feedback loops, automates memory tuning, leading to more efficient resource allocation, improved user productivity and increased job reliability. By leveraging historical data and real-time error classification, we dynamically adjust memory, significantly narrowing the gap between allocated and utilized resources while reducing failures. This initiative has achieved a 13% increase in memory utilization and a 90% drop in OOM-related job failures, saving us 1000s of PBHrs of compute every year.

Session Speakers

Shreyesh Arangath

/Senior Software Engineer
LinkedIn