Databricks Certification and Badging

The new standard for lakehouse training and certifications

Associate   |   Professional

Databricks Certified Data Engineer Professional

The Databricks Certified Data Engineer Professional certification exam assesses an individual’s ability to use Databricks to perform advanced data engineering tasks. This includes an understanding of the Databricks platform and developer tools like Apache Spark, Delta Lake, MLflow, and the Databricks CLI and REST API. It also assesses the ability to build optimized and cleaned ETL pipelines. Additionally, modeling data into a Lakehouse using knowledge of general data modeling concepts will also be assessed. Finally, ensuring that data pipelines are secure, reliable, monitored, and tested before deployment will also be included in this exam. Individuals who pass this certification exam can be expected to complete advanced data engineering tasks using Databricks and its associated tools.

Registration

In order to achieve this certification, earners must pass a certification exam. In order to achieve this certification, please either log in or create an account in our certification platform.

Learning Pathway

This certification is part of the Data Engineer learning pathway.

Learning Path

background-image

Exam Details

Key details about the certification exam are provided below.

Minimally Qualified Candidate

The minimally qualified candidate should be able to:

  • Understand how to use and the benefits of using the Databricks platform and its tools, including:
    • Platform (notebooks, clusters, Jobs, Databricks SQL, relational entities, Repos)
    • Apache Spark (PySpark, DataFrame API, basic architecture)
    • Delta Lake (SQL-based Delta APIs, basic architecture, core functions)
    • Databricks CLI (deploying notebook-based workflows)
    • Databricks REST API (configure and trigger production pipelines)
  • Build data processing pipelines using the Spark and Delta Lake APIs, including:
    • Building batch-processed ETL pipelines
    • Building incrementally processed ETL pipelines
    • Optimizing workloads
    • Deduplicating data
    • Using Change Data Capture (CDC) to propagate changes
  • Model data management solutions, including:
    • Lakehouse (bronze/silver/gold architecture, databases, tables, views, and the physical layout)
    • General data modeling concepts (keys, constraints, lookup tables, slowly changing dimensions)
  • Build production pipelines using best practices around security and governance, including:
    • Managing notebook and jobs permissions with ACLs
    • Creating row- and column-oriented dynamic views to control user/group access
    • Securely storing personally identifiable information (PII)
    • Securely delete data as requested according to GDPR & CCPA
  • Configure alerting and storage to monitor and log production jobs, including:
    • Setting up notifications
    • Configuring SparkListener
    • Recording logged metrics
    • Navigating and interpreting the Spark UI
    • Debugging errors
  • Follow best practices for managing, testing and deploying code, including:
    • Managing dependencies
    • Creating unit tests
    • Creating integration tests
    • Scheduling Jobs
    • Versioning code/notebooks
    • Orchestration Jobs

Duration

Testers will have 120 minutes to complete the certification exam.

Questions

There are 60 multiple-choice questions on the certification exam. The questions will be distributed by high-level topic in the following way:

  • Databricks Tooling – 20% (12/60)
  • Data Processing – 30% (18/60)
  • Data Modeling – 20% (12/60)
  • Security and Governance – 10% (6/60)
  • Monitoring and Logging – 10% (6/60)
  • Testing and Deployment – 10% (6/60)

Cost

Each attempt of the certification exam will cost the tester $200. Testers might be subjected to tax payments depending on their location. Testers are able to retake the exam as many times as they would like, but they will need to pay $200 for each attempt.

Test Aids

There are no test aids available during this exam.

Programming Language

This certification exam’s code examples will primarily be in Python. However, any and all references to Delta Lake functionality will be made in SQL.

Expiration

Because of the speed at which the responsibilities of a data engineer and capabilities of the Databricks Lakehouse Platform change, this certification is valid for 2 years following the date on which each tester passes the certification exam.

Preparation

In order to learn the content assessed by the certification exam, candidates should take one of the following Databricks Academy courses:

  • Self-paced (available in Databricks Academy): Advanced Data Engineering with Databricks
  • Self-paced (available in Databricks Academy): Certification Overview: Databricks Certified Data Engineer Professional Exam

Note: the preparation material for this certification exam is not available for customer-facing instructor-led trainings.

Frequently Asked Questions

In order to view answers to frequently asked questions (FAQs), please refer to Databricks Academy FAQ document.