Sponsored by: Sync Computing | Best Practices to Manage Databricks Clusters at Scale to Lower Costs
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Engineering and Streaming |
INDUSTRY | Enterprise Technology, Health and Life Sciences, Financial Services |
TECHNOLOGIES | Apache Spark, ETL, Governance |
SKILL LEVEL | Beginner |
DURATION | 40 min |
Many companies quickly scale up on their Databricks usage to thousands of jobs, only to find themselves with ballooning costs and difficult to manage infrastructure. Platform teams often find themselves gridlocked with other groups and priorities, unable to act to resolve these problems. At Sync, we’ve worked with companies from startups to the fortune 100 on their Databricks usage, identifying common trends and pitfalls. In this talk, we’ll present common findings on both what practices work and what doesn’t work for their Jobs clusters, SQL warehouses, all purpose compute clusters, and ML workloads. We’ve observed companies save up to 75% with their Databricks spend by implementing various techniques to optimize performance. In this talk, we’ll also present Sync’s automated Databricks management solution, Gradient, which can automate many of the lessons learned here to help companies bring down costs at scale - automatically.
SESSION SPEAKERS
Jeff Chou
/CEO / Co-Founder
Sync Computing