Developing Applications with Apache Spark™

Master scalable data processing with Apache Spark in this hands-on course. Learn to build efficient ETL pipelines, perform advanced analytics, and optimize distributed data transformations using Spark’s DataFrame API. Explore grouping, aggregation, joins, set operations, and window functions. Work with complex data types like arrays, maps, and structs while applying best practices for performance optimization.

Note: Databricks Academy is transitioning from video lectures to a more streamlined PDF format with slides and notes for all self-paced courses. Please note that demo videos will still be available in their original format. We would love to hear your thoughts on this change, so please share your feedback through the course survey at the end. Thank you for being a part of our learning community!

Languages Available: English | 日本語 | 한국어

Skill Level

Associate

Duration

Prerequisites

- Basic programming knowledge

- Familiarity with Python

- Basic understanding of SQL queries (SELECT, JOIN, GROUP BY)

- Familiarity with data processing concepts

- "Introduction to Apache Spark Course" or Previous Databricks experience required

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

Customer registration Partner registration

See all our registration options

Registration options

Databricks has a delivery method for wherever you are on your learning journey

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

Instructor-Led

Public and private courses taught by expert instructors across half-day to two-day courses

Blended Learning

Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase

Purchase now

Skills@Scale

Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details

Upcoming Public Classes

Generative AI Engineer

AI Agent Fundamentals

This foundational course introduces AI agents and their use in enterprise applications on Databricks, including the Mosaic AI platform and Agent Bricks. Learners will examine what AI agents are, how they function, and how they mimic human reasoning to handle complex tasks.

The course covers real-world agent use cases and provides a basic introduction to advanced topics such as agentic workflows and multi-agent systems. It also explores how Agent Bricks simplifies the development of enterprise-ready agents across various applications, with demos showing how to build and use agents on Databricks.

Languages Available: English | 日本語 | Português BR | 한국어

Free

1h 30m

Introductory

Generative AI Engineer

Prompt Engineering Fundamentals

This course equips end users to get the most from their organization’s AI-powered assistants. Learners begin by understanding how the assistant works, its capabilities and limitations, and how to to create precise, business-relevant prompts by applying the COIE framework—Context, Outcome, Instruction, Example. They then progress to prompting techniques such as zero-shot, few-shot, chain-of-thought, self-ask, and meta-prompting to handle diverse content, reasoning, and workflow tasks. By the end, participants can structure complex requests, chain prompts effectively, and collaborate with their AI assistant as a reliable, reasoning partner that delivers accurate, actionable insights.

Free

Introductory

Data Engineer

Get Started with Databricks for Data Engineering

In this course, you will learn basic skills that will allow you to use the Databricks Data Intelligence Platform to perform a simple data engineering workflow and support data warehousing endeavors. You will be given a tour of the workspace and be shown how to work with objects in Databricks such as catalogs, schemas, volumes, tables, compute clusters, and notebooks. You will then follow a basic data engineering workflow to perform tasks such as creating and working with tables, ingesting data into Delta Lake, transforming data through the medallion architecture, and using Databricks Workflows to orchestrate data engineering tasks. You’ll also learn how Databricks supports data warehousing needs through the use of Databricks SQL, Lakeflow Spark Declarative Pipelines, and Unity Catalog.

Free

Onboarding