Optimizing Apache Spark™ on Databricks
In this course, you will explore the five key problems that represent the vast majority of performance issues in an Apache Spark application: skew, spill, shuffle, storage, and serialization. With examples based on 100 GB to 1+ TB datasets, you will investigate and diagnose sources of bottlenecks with the Spark UI and learn effective mitigation strategies. You will also discover new features introduced in Spark 3 that can automatically address common performance problems. Lastly, you learn how to design and configure clusters for optimal performance based on specific team needs and concerns.
Outline
Day 1
- Review of Spark architecture and Spark UI
- Skew
- Spill
- Shuffle
- Storage
- Serialization
Day 2
- Ingestion basics
- Predicate push downs
- Disk partitioning
- Z-ordering
- Bucketing
- Optimization with Adaptive Query Execution (AQE)
- Designing and configuring clusters for high performance
Upcoming Public Classes
Date | Time | Language | Price |
---|---|---|---|
Jul 31 - Aug 01 | 09 AM - 05 PM (Australia/Sydney) | English | $1500.00 |
Aug 05 - 06 | 09 AM - 05 PM (America/New_York) | English | $1500.00 |
Aug 21 - 22 | 09 AM - 05 PM (Australia/Sydney) | English | $1500.00 |
Aug 27 - 30 | 02 PM - 06 PM (Europe/Paris) | English | $1500.00 |
Public Class Registration
If your company has purchased success credits or has a learning subscription, please fill out the Training Request form. Otherwise, you can register below.
Private Class Request
If your company is interested in private training, please submit a request.
Registration options
Databricks has a delivery method for wherever you are on your learning journey
Self-Paced
Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos
Register nowInstructor-Led
Public and private courses taught by expert instructors across half-day to two-day courses
Register nowBlended Learning
Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase
Purchase nowSkills@Scale
Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details
Upcoming Public Classes
![Career Workshop](/trainings-assets/static/employees-50ed31c533c98e419f20eebf465988c1.jpg)
Career Workshop/
March 20