Machine Learning at Scale
In this course, you will gain theoretical and practical knowledge of Apache Spark’s architecture and its application to machine learning workloads within Databricks. You will learn when to use Spark for data preparation, model training, and deployment, while also gaining hands-on experience with Spark ML and pandas APIs on Spark. This course will introduce you to advanced concepts like hyperparameter tuning and scaling Optuna with Spark. This course will use features and concepts introduced in the associate course such as MLflow and Unity Catalog for comprehensive model packaging and governance.
Note: This course is the first in the series of Advanced Machine Learning.
The content was developed for participants with these skills/knowledge/abilities:
• Familiarity with the Databricks Data Intelligence Platform and basic workspace operations (create clusters, run code in notebooks, use basic notebook operations, import repos from git).
• Intermediate programming experience with Python, including data manipulation libraries (pandas, numpy) and machine learning frameworks (scikit-learn).
• Basic knowledge of Apache Spark and PySpark fundamentals, including DataFrames, transformations, and actions for distributed data processing.
• Understanding of machine learning concepts, including model training, evaluation, hyperparameter tuning, and deployment workflows.
• Intermediate experience with Delta Lake operations (create tables, perform updates, optimize files, time travel functionality).
• Basic familiarity with MLflow for experiment tracking, model logging, and model registry operations.
• Understanding of distributed computing concepts (cluster architecture, parallelization, scalability considerations).
• Basic knowledge of SQL for data querying and manipulation within Spark environments.
Self-Paced
Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos
Registration options
Databricks has a delivery method for wherever you are on your learning journey
Self-Paced
Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos
Register nowInstructor-Led
Public and private courses taught by expert instructors across half-day to two-day courses
Register nowBlended Learning
Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase
Purchase nowSkills@Scale
Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details

