Apache Spark Core—Deep Dive—Proper Optimization

Download Slides

Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the “right” size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn’t adding nodes decrease my compute time?


Try Databricks
See More Spark + AI Summit in San Francisco 2019 Videos

« back
About Daniel Tomes

Daniel Tomes leads the Resident Solutions Architect Practice at Databricks and is responsible for vertical integration, productization and strategic client growth. His big data journey began in 2014 at a major oil and gas company after which he moved to Cloudera for two years as a Solutions Architect and in 2017 join Databricks.