Accelerate Your Career With Data Engineer Learning Pathway Improvements

Published: November 29, 2022

News3 min read

Value of Databricks Training & Certifications

Databricks Academy offers training and certifications that help Databricks users master the latest and most relevant data, analytics, and AI skills to accelerate in their current role or expand to new opportunities. Whether learners want to upskill on the lakehouse or become a Databricks expert, these training and certifications help learners build expertise in the lakehouse and effectively drive use cases – from data warehousing to machine learning – to increase productivity and improve outcomes for one's organization.

Syed Kazmi, Data Engineer at BluTech Consulting, reiterates the value of Databricks certifications, "getting certified by an industry-leading organization really helps to gain knowledge as well as the trust of a client." We're continuing to expand our curriculum and opportunities at Databricks Academy. We're excited to introduce a number of new enhancements for data engineers!

Improvements in the Data Engineer Associate Learning Path

We've made improvements to the Data Engineer Associate learning path to better prepare you for the certification exam, which include updates to the self-paced learning content and certification exam, as well as knowledge-checks at the end of each module to assess your knowledge of each concept. The knowledge-checks help you upskill on the lakehouse and validate your knowledge in preparation for the certification exam. We've started with Databricks Data Engineer Associate self-paced content and will continue to work on knowledge checks on each role-based learning path.

Below is a preview of some of the knowledge checks following the modules in the Data Engineer Associate learning path:

A data engineer is creating a multi-node cluster.

Which of the following statements describes how workloads will be distributed across this cluster? Select one response.
- Workloads are distributed across available worker nodes by the driver node.
- Workloads are distributed across available worker nodes by the executor.
- Workloads are distributed across available memory by the executor.
- Workloads are distributed across available compute resources by the executor.
- Workloads are distributed across available driver nodes by the worker node.

An organization's data warehouse team is using a change data capture (CDC) feed that needs to meet the CCPA compliance standards. They are worried that their current architecture will not support this workload.

Which of the following explains how employing Delta Lake in a data lakehouse architecture addresses these concerns? Select one response.
- Delta Lake supports merge, update and delete operations to enable complex use cases.
- Delta Lake supports data management for transformations based on a target schema for each processing step.
- Delta Lake supports automatic logging of experiments, parameters and results from notebooks directly to MLflow.
- Delta Lake supports expectations to define expected data quality and specify how to handle records that fail those expectations.
- Delta Lake supports integration for experiment tracking and built-in ML best practices.

A data engineer needs a reference to the results of a query that can be referenced across multiple queries within the scope of the environment session. The data engineer does not want the reference to exist outside of the scope of the environment session.

Which of the following approaches accomplishes this? Select one response.
- They can store the results of their query within a temporary view.
- They can store the results of their query within a reusable user-defined function (UDF).
- They can store the results of their query within a view.
- They can store the results of their query within a common table expression (CTE).
- They can store the results of their query within a table.

In addition, the Data Engineering learning pathway and Databricks Data Engineer Associate certification exam are being updated to continue to reflect best practices for data engineering in the Databricks Lakehouse Platform. The changes include the addition of more robust data governance and security content with the release of Unity Catalog, increased depth provided to content about building data pipelines using Delta Live Tables, and the availability of a Python-first learning and certification experience.

How to Get Started

Access Databricks Data Engineer Associate self-paced content in Databricks Academy by starting with the Data Engineering with Databricks course in the Data Engineer Learning Plan.

What's next?

November 25, 2024/3 min read

Announcing the Winners of the Generative AI World Cup

December 11, 2024/4 min read

Improvements in the Data Engineer Associate Learning Path

Databricks 101: A Practical Primer

Never miss a Databricks post

Sign up

What's next?

Announcing the Winners of the Generative AI World Cup

Innovators Unveiled: Announcing the Databricks Generative AI Startup Challenge Winners!