Monitoring and Optimizing Apache Spark Workloads on Databricks

This course explores the Lakehouse architecture and Medallion design for scalable data workflows, focusing on Unity Catalog for secure data governance, access control, and lineage tracking. The curriculum includes building reliable, ACID-compliant pipelines with Delta Lake. You'll examine Spark optimization techniques, such as partitioning, caching, and query tuning, and learn performance monitoring, troubleshooting, and best practices for efficient data engineering and analytics to address real-world challenges.

Note: Databricks Academy is transitioning from video lectures to a more streamlined PDF format with slides and notes for all self-paced courses. Please note that demo videos will still be available in their original format. We would love to hear your thoughts on this change, so please share your feedback through the course survey at the end. Thank you for being a part of our learning community!

Languages Available: English | 日本語 | 한국어

Skill Level

Associate

Duration

Prerequisites

Basic programming knowledge
Familiarity with Python
Basic understanding of SQL queries (SELECT, JOIN, GROUP BY)
Familiarity with data processing concepts
No prior Spark or Databricks experience required

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

Customer registration Partner registration

See all our registration options

Registration options

Databricks has a delivery method for wherever you are on your learning journey

Self-Paced

Custom-fit learning paths for data, analytics, and AI roles and career paths through on-demand videos

Instructor-Led

Public and private courses taught by expert instructors across half-day to two-day courses

Blended Learning

Self-paced and weekly instructor-led sessions for every style of learner to optimize course completion and knowledge retention. Go to Subscriptions Catalog tab to purchase

Purchase now

Skills@Scale

Comprehensive training offering for large scale customers that includes learning elements for every style of learning. Inquire with your account executive for details

Upcoming Public Classes

Platform Administrator

Get Started with Data Governance on Databricks

In this course, you will explore how Unity Catalog enables secure, centralized data governance and fine-grained access control on Databricks. You will learn about table and volume types, catalog and schema configuration, group-based access management, and strategies for migrating existing access controls into Unity Catalog. The course also explains how to design and apply fine-grained controls such as row-level security, column masking, and attribute-based access control, how to combine these mechanisms across data and AI assets, and how to align them with broader governance requirements for compliant, scalable access management.

Free

instructor-led

Onboarding

Data Governance Fundamentals - Korean

이 강의에서는 데이터 거버넌스와 Unity Catalog의 기본을 탐구하게 됩니다. 데이터 카탈로그의 진화와 목적, Unity Catalog의 아키텍처와 구성 요소, 그리고 거버넌스에서 데이터 품질, 보안, 규정 준수, 리니지, 감사의 중요성에 대해 배우게 됩니다.

Languages Available: English | 日本語 | 한국어

Free

45m

Introductory

Databricks Data Sharing & Collaboration

In this course, you will learn how to leverage Databricks' data sharing capabilities to enable secure, efficient collaboration across your organization and with external partners. The course covers the fundamentals of modern data sharing, exploring the business value it delivers and examining Databricks' three core solutions: Delta Sharing, Marketplace, and Clean Rooms. You'll understand the architecture of Delta Sharing, including both Databricks-to-Databricks (D2D) and open (D2O) sharing models, and learn how to share diverse asset types such as tables, AI models, and notebooks. The course further explores advanced replication strategies, cross-cloud and cross-region optimization, and implementation of granular access controls using dynamic views and partition filtering. You'll also master security and governance practices, including token management, fine-grained access control, cost optimization, and auditing strategies. By the end, you'll have the skills to design secure Delta Sharing solutions for internal and external use cases, optimize multi-party collaboration while managing costs, and choose the right Databricks sharing solution for your specific business needs.

Free

Associate