Scalable Machine Learning with Apache Spark™
This course teaches you how to scale ML pipelines with Spark, including distributed training, hyperparameter tuning, and inference. You will build and tune ML models with SparkML while leveraging MLflow to track, version, and manage these models. This course covers the latest ML features in Apache Spark, such as Pandas UDFs, Pandas Functions, and the pandas API on Spark, as well as the latest ML product offerings, such as Feature Store and AutoML.
This course will prepare you to take the Databricks Certified Machine Learning Associate exam.
2 full days or 4 half days
Perform scalable EDA with Spark
Build and tune machine learning models with SparkML
Track, version, and deploy models with MLflow
Perform distributed hyperparameter tuning with HyperOpt
Use the Databricks Machine Learning workspace to create a Feature Store and AutoML experiments
Leverage the pandas API on Spark to scale your pandas code
Intermediate experience with Python (or completion of Introduction to Python for Data Science & Data Engineering)
Familiarity with PySpark DataFrame API (or completion of Apache Spark Programming)
Experience building machine learning models
Spark / ML overview
Exploratory data analysis (EDA) and feature engineering with Spark
SparkML: transformers, estimators, pipelines, and evaluators
MLflow Tracking and Model Registry
Parallelizable hyperparameter tuning
Databricks AutoML and Feature Store
Integrating 3rd party packages (distributed XGBoost)
Distributed inference of scikit-learn models with pandas UDFs
Distributed training with pandas function API
Pandas API on Spark for data manipulation
Upcoming Public Classes
Public Class Registration
If your company has purchased success credits or has a learning subscription, please fill out the public training requests form. Otherwise, you can register below.
Private Class Delivery
If your organization would like to request a private delivery of the course, please fill out the request form below.