Skip to main content

Unwrapping the secrets of a data + AI career

Jody Soeiro de Faria
Kathryn Kearney
Sean Park
Trang Le
Share this post

Upskilling through role-based pathways to accelerate your data + AI career

Databricks has spent years crafting and iterating technical trainings for learners across data, analytics, and AI disciplines to ensure that individuals, teams, and organizations that want to upskill or reskill have accessible and relevant content. With the explosion of AI/ML and roles in data, analytics, and AI, the need to adopt new technology has accelerated for many organizations. It's predicted that 97 million jobs involving AI will be created between 2022 and 2025. This presents a unique challenge – upskilling talent in a scalable way.

Elevate your career today with Databricks' Learning Festival

Databricks' virtual Learning Festival is a unique opportunity to upskill and reskill across data engineering, data science, and data analytics courses built for our customers, prospects, and partners. This event will provide access to free self-paced, role-based content. For those who successfully complete the self-paced training, they will be eligible to receive a 50%-off Databricks certification voucher (more details below).

Learning objectives across self-paced courses

1: Data Engineer Course - Data Engineering with Databricks

This course prepares data professionals to leverage the Databricks Data Intelligence Platform to productionalize ETL pipelines. Students will use Delta Live Tables to define and schedule pipelines that incrementally process new data from a variety of data sources into the platform. Students will also orchestrate tasks with Databricks Workflows and promote code with Databricks Repos.

Learning Objectives:

  • Use the Databricks Data Science and Engineering Workspace to perform common code development tasks in a data engineering workflow.
  • Use Spark SQL or PySpark to extract data from a variety of sources, apply common cleaning transformations, and manipulate complex data with advanced functions.
  • Define and schedule data pipelines that incrementally ingest and process data through multiple tables in the lakehouse using Delta Live Tables in Spark SQL or Python.
  • Orchestrate data pipelines with Databricks Workflow Jobs and schedule dashboard updates to keep analytics up-to-date.
  • Configure permissions in Unity Catalog to ensure that users have proper access to databases for analytics and dashboarding.

Enrollment Link

2: Data Engineer Course - Advanced Data Engineering with Databricks

In this course, students will build upon their existing knowledge of Apache Spark, Structured Streaming, and Delta Lake to unlock the full potential of the generative data platform by utilizing the suite of tools provided by Databricks. This course places a heavy emphasis on designs favoring incremental data processing, enabling systems optimized to continuously ingest and analyze ever-growing data. By designing workloads that leverage built-in platform optimizations, data engineers can reduce the burden of code maintenance and on-call emergencies, and quickly adapt production code to new demands with minimal refactoring or downtime. The topics in this course should be mastered prior to attempting the Databricks Certified Data Engineering Professional exam.

Learning Objectives:

  • Design databases and pipelines optimized for the Databricks Data Intelligence Platform.
  • Implement efficient incremental data processing to validate and enrich data driving business decisions and applications.
  • Leverage Databricks-native features for managing access to sensitive data and fulfilling right-to-be-forgotten requests.
  • Manage code promotion, task orchestration, and production job monitoring using Databricks tools.

Enrollment Link

3: Data Analyst Course - Data Analysis with Databricks SQL

This course provides a comprehensive introduction to Databricks SQL. It is designed with the intention of supporting individuals seeking the Associate Data Analysis of Databricks SQL certification. Participants will learn about ingesting data, writing queries, producing visualizations and dashboards, and how to connect Databricks SQL to additional tools by using Partner Connect.

Learning Objectives:

  • Describe how Databricks SQL works in the Lakehouse architecture
  • Integrate Unity Catalog and Delta Lake with Databricks SQL
  • Describe how Databricks SQL implements data security
  • Query data in Databricks SQL
  • Use SQL commands specific to Databricks
  • Create visualizations and dashboards in Databricks SQL
  • Use automation and integration capabilities in Databricks SQL
  • Share queries and dashboards with others using Databricks SQL

Enrollment Link

4: Machine Learning Practitioner Course - Scalable Machine Learning with Apache Spark

This course teaches you how to scale ML pipelines with Spark, including distributed training, hyperparameter tuning, and inference. You will build and tune ML models with SparkML while leveraging MLflow to track, version, and manage these models. This course covers the latest ML features in Apache Spark, such as Pandas UDFs, Pandas Functions, and the pandas API on Spark, as well as the latest ML product offerings, such as Feature Store and AutoML.

Learning Objectives:

  • Perform scalable EDA with Spark
  • Build and tune machine learning models with SparkML
  • Track, version, and deploy models with MLflow
  • Perform distributed hyperparameter tuning with HyperOpt
  • Use the Databricks Machine Learning workspace to create a Feature Store and AutoML experiments
  • Leverage the pandas API on Spark to scale your pandas code

Enrollment Link

5: Machine Learning Practitioner Course - Machine Learning in Production

In this course, you will learn MLOps best practices for putting machine learning models into production. The first half of the course uses a feature store to register training data and uses MLflow to track the machine learning lifecycle, package models for deployment, and manage model versions. The second half of the course examines production issues including deployment paradigms, monitoring, and CI/CD. By the end of this course, you will have built an end-to-end pipeline to log, deploy, and monitor machine learning models.

Learning Objectives:

  • Track, version, and manage machine learning experiments.
  • Leverage Databricks Feature Store for reproducible data management.
  • Implement strategies for deploying models for batch, streaming, and real-time.
  • Build monitoring solutions, including drift detection.

Enrollment Link

There are 4 more Learning Plans offered as part of the Databricks Learning Festival.

* How to be eligible for Databricks certification voucher

A 50%-off Databricks certification voucher1 will be given to the first 5,000 users who complete at least one of the role-based courses within the duration of the virtual Learning Festival.

1The remaining US $100 can be paid for through webassesor at the time of the exam registration through credit card only.


  1. Only one voucher will be given, whether the learner completes one or multiple course(s) / learning plan(s).
  2. The voucher will have a validity period of 6 months (i.e. expire after 6 months).
  3. The voucher is applicable for the following exams only:
    • Databricks Certified Data Engineer Associate
    • Databricks Certified Data Engineer Professional
    • Databricks Certified Data Analyst Associate
    • Databricks Certified Machine Learning Associate
    • Databricks Certified Machine Learning Professional
  4. The voucher will be distributed 1-2 week(s) after the event closes.
  5. The certification voucher cannot be combined with other offers or success credits.

Have questions? Ask in the Databricks Community: Databricks Academy Learners Group

Begin upskilling and reskilling today with Databricks Academy with the virtual Databricks Learning Festival

Try Databricks for free

Related posts

See all Company Blog posts