Apache Spark™ Programming with Databricks

This course serves as an appropriate entry point to learn Apache Spark Programming with Databricks.

Below, we describe each of the four, four-hour modules included in this course.

Introduction to Apache Spark

This course offers essential knowledge of Apache Spark, with a focus on its distributed architecture and practical applications for large-scale data processing. Participants will explore programming frameworks, learn the Spark DataFrame API, and develop skills for reading, writing, and transforming data using Python-based Spark workflows.

Developing Applications with Apache Spark

Master scalable data processing with Apache Spark in this hands-on course. Learn to build efficient ETL pipelines, perform advanced analytics, and optimize distributed data transformations using Spark’s DataFrame API. Explore grouping, aggregation, joins, set operations, and window functions. Work with complex data types like arrays, maps, and structs while applying best practices for performance optimization.

Stream Processing and Analysis with Apache Spark

Learn the essentials of stream processing and analysis with Apache Spark in this course. Gain a solid understanding of stream processing fundamentals and develop applications using the Spark Structured Streaming API. Explore advanced techniques such as stream aggregation and window analysis to process real-time data efficiently. This course equips you with the skills to create scalable and fault-tolerant streaming applications for dynamic data environments.

Monitoring and Optimizing Apache Spark Workloads on Databricks

This course explores the Lakehouse architecture and Medallion design for scalable data workflows, focusing on Unity Catalog for secure data governance, access control, and lineage tracking. The curriculum includes building reliable, ACID-compliant pipelines with Delta Lake. You'll examine Spark optimization techniques, such as partitioning, caching, and query tuning, and learn performance monitoring, troubleshooting, and best practices for efficient data engineering and analytics to address real-world challenges.

Languages Available: English | 日本語

Skill Level

Associate

Duration

16h

Prerequisites

Prerequisites

Basic programming knowledge
Familiarity with Python
Basic understanding of SQL queries (SELECT, JOIN, GROUP BY)
Familiarity with data processing concepts
No prior Spark or Databricks experience required

Outline

Introduction to Apache Spark

Spark Runtime Architecture

Exploring Apache Spark Architecture in Databbricks

Introduction to Spark DataFrames and SQL

Reading and Writing Data with DataFrames

Distributed System Programming Fundamentals

Basic ETL with the DataFrame API

Flight Data ETL with the DataFrame API

Analyzing Transaction Data with DataFrames

Developing Applications with Apache Spark

DataFrame API Basics

Demo: (Optional) Basic ETL with the DataFrame API

Grouping and Aggregating Data

Demo: Grouping and Aggregating Data

Lab: Grouping and Aggregating E-Commerce Data

Relational Operations

Demo: Data Relational Operations in Apache Spark

Working with Complex Data

Demo: Working with Complex Data Types in Apache Spark

Lab: Working with Complex Data Types in E-Commerce Data

Stream Processing and Analysis with Apache Spark

Introduction to Stream Processing

Spark Structured Streaming

Demo: Introduction to Spark Structured Streaming

Lab: Introduction to Spark Structured Streaming

Advanced Stream Processing and Analysis

Demo: Window Aggregation in Spark Structured Streaming

Lab: Window Aggregation in Spark Structured Streaming

Monitoring and Optimizing Apache Spark Workloads on Databricks

Apache Spark and Databricks

Using Apache Spark with Delta Lake

Demo: Introduction to Delta Lake

Lab: Introduction to Delta Lake

Optimizing Apache Spark

Demo: Optimizing Apache Spark

Lab: Optimizing Apache Spark

Upcoming Public Classes

Date	Time	Language	Price
Date	Time	Language	Price	Jul 28 - 31	02 PM - 06 PM (Europe/Paris)	English	$1500.00
Jul 28 - 31	02 PM - 06 PM (America/New_York)	English	$1500.00
Aug 11 - 12	09 AM - 05 PM (Europe/Paris)	English	$1500.00
Aug 18 - 21	11 AM - 03 PM (Asia/Singapore)	English	$1500.00
Aug 18 - 21	02 PM - 06 PM (Europe/Paris)	English	$1500.00
Aug 18 - 21	02 PM - 06 PM (America/New_York)	English	$1500.00
Sep 08 - 09	09 AM - 05 PM (Australia/Sydney)	English	$1500.00
Sep 08 - 09	09 AM - 05 PM (Europe/Paris)	English	$1500.00
Sep 08 - 09	09 AM - 05 PM (America/Los_Angeles)	English	$1500.00
Sep 15 - 18	11 AM - 03 PM (Asia/Singapore)	English	$1500.00
Sep 15 - 18	02 PM - 06 PM (Europe/Paris)	English	$1500.00
Sep 15 - 18	02 PM - 06 PM (America/New_York)	English	$1500.00
Oct 01 - 02	09 AM - 05 PM (America/Los_Angeles)	English	$1500.00
Oct 06	09 AM - 05 PM (Australia/Sydney)	English	$1500.00
Oct 06 - 07	09 AM - 05 PM (Europe/Paris)	English	$1500.00
Oct 20 - 23	02 PM - 06 PM (Europe/Paris)	English	$1500.00
Oct 20 - 23	02 PM - 06 PM (America/New_York)	English	$1500.00
Oct 27 - 30	11 AM - 03 PM (Asia/Singapore)	English	$1500.00

Public Class Registration

If your company has purchased success credits or has a learning subscription, please fill out the Training Request form. Otherwise, you can register below.

Customer registration Partner registration

Private Class Request

If your company is interested in private training, please submit a request.

Request Private Training

See all our registration options

Registration options

Databricks has a delivery method for wherever you are on your learning journey

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

Instructor-Led

Public and private courses taught by expert instructors across half-day to two-day courses

Blended Learning

Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase

Purchase now

Skills@Scale

Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details

Upcoming Public Classes

Data Engineer

Databricks Performance Optimization

In this course, you’ll learn how to optimize workloads and physical layout with Spark and Delta Lake and and analyze the Spark UI to assess performance and debug applications. We’ll cover topics like streaming, liquid clustering, data skipping, caching, photons, and more.

Note: This course is part of the 'Advanced Data Engineering with Databricks' course series.

Languages Available: English | 日本語 | Português BR | 한국어

Data Analysis with Databricks

This course provides a comprehensive introduction to Databricks SQL. Learners will ingest data, write queries, produce visualizations and dashboards, and configure alerts. This course will prepare you to take the Databricks Certified Data Analyst Associate exam.

This course consists of two four-hour modules.

SQL Analytics on Databricks

In this course, you'll learn how to effectively use Databricks for data analytics, with a specific focus on Databricks SQL. As a Databricks Data Analyst, your responsibilities will include finding relevant data, analyzing it for potential applications, and transforming it into formats that provide valuable business insights.

You will also understand your role in managing data objects and how to manipulate them within the Databricks Data Intelligence Platform, using tools such as Notebooks, the SQL Editor, and Databricks SQL.

Additionally, you will learn about the importance of Unity Catalog in managing data assets and the overall platform. Finally, the course will provide an overview of how Databricks facilitates performance optimization and teach you how to access Query Insights to understand the processes occurring behind the scenes when executing SQL analytics on Databricks.

AI/BI for Data Analysts

In this course, you’ll learn how to use the features Databricks provides for business intelligence needs: AI/BI Dashboards and AI/BI Genie. As a Databricks Data Analyst, you will be tasked with creating AI/BI Dashboards and AI/BI Genie Spaces within the platform, managing the access to these assets by stakeholders and necessary parties, and maintaining these assets as they are edited, refreshed, or decommissioned over the course of their lifespan. This course intends to instruct participants on how to design dashboards for business insights, share those with collaborators and stakeholders, and maintain those assets within the platform. Participants will also learn how to utilize AI/BI Genie Spaces to support self-service analytics through the creation and maintenance of these environments powered by the Databricks Data Intelligence Engine.

Languages Available: English | 日本語 | Português BR | 한국어

Data Warehousing Practitioner

Data Warehousing with Databricks

This course is designed for data professionals who want to explore the data warehousing capabilities of Databricks. Assuming no prior knowledge of Databricks, it provides an introduction to leveraging Databricks as a modern cloud-based data warehousing solution. Learners will explore how use the Databricks Data Intelligence Platform to ingest, transform, govern, and analyze data efficiently using the industry-standard TCP-DI dataset as a reference. Learners will also explore Genie, an innovative Databricks feature that simplifies data exploration through natural language queries. By the end of this course, participants will be equipped with the foundational skills to implement and optimize a data warehouse using Databricks.

Languages Available: English | 日本語 | Português BR | 한국어