Skip to main content

Instructor-Led Training

Instructor-led Training hero

Advanced Data Engineering (2 days)

In this course, students will build upon their existing knowledge of Apache Spark™, Structured Streaming and Delta Lake to unlock the full potential of the data lakehouse by utilizing the suite of tools provided by Databricks. This course places a heavy emphasis on designs favoring incremental data processing, enabling systems optimized to continuously ingest and analyze ever-growing data.

Apache Spark™ Programming with Databricks (2 days)

This course explores the fundamentals of Spark programming on the Databricks platform, including Spark architecture, the DataFrame API, basic query optimization, Structured Streaming, and Delta Lake.

Data Analysis with Databricks SQL (1 day)

This course provides a comprehensive introduction to Databricks SQL. Students will write queries for data lakehouses, produce visualizations and dashboards, and set up automatic alerts for reporting to stakeholders.

Data Engineering with Databricks (2 days)

This course introduces best practices for using Databricks to build data pipelines through lectures and hands-on labs. Topics include data ingestion and processing techniques, building and executing data pipelines with Delta Live Tables and Databricks Workflows, and data governance with Unity Catalog. At the end of the course, you will have all the knowledge and skills that a data engineer would need to build an end-to-end Delta Lake pipeline for streaming and batch data.

Deep Learning with Databricks (2 days)

This course covers the fundamentals of neural networks with TensorFlow and how to scale training, inference, and hyperparameter tuning of deep learning models with Apache Spark.

Introduction to Python for Data Science & Data Engineering (2 days)

This course is intended for complete beginners to Python to provide the basics of programmatically interacting with data using standard data manipulation and visualization libraries.

Machine Learning in Production (1 day)

This course is best to take after completion of Scalable Machine Learning With Apache Spark and covers the best practices for managing the machine learning lifecycle, from model creation to model management and serving. Students will explore various ML operations concepts like deployment paradigms and CI/CD methods, learn how to serve models through MLflow, and explore the effects of data drift.

Optimizing Apache Spark™ on Databricks (2 days)

This course helps experienced Apache Spark programmers understand the primary causes of Spark performance problems, use the Spark UI to identify root causes of performance issues, and apply mitigation techniques to increase performance and stability in Spark applications.

Scalable Machine Learning with Apache Spark™ (2 days)

This course teaches the full scalable data science workflow using Apache Spark ML, including data cleaning and exploration, feature engineering, model training, and hyperparameter tuning. By the end of this course, you will have built an end-to-end distributed machine learning pipeline ready for production.

See a class you are interested in?

You can click here to view our public training schedule.

If your company has already purchased success credits or a learning subscription, please fill out the public training requests form. If you would like to request a private training class, please fill out the private training requests form. Otherwise, you can register on the Academy page below directly.

Questions?

If you have any questions, please refer to our Frequently Asked Questions page.