Build Data Pipelines with Delta Live Tables
- Audience: Data engineers
- Hands-on labs: Yes
- Certification path: Databricks Certified Data Engineer Associate
- Description: In this half-day course, you’ll learn how to define and schedule data pipelines that incrementally ingest and process data through multiple tables in the lakehouse using Delta Live Tables (DLT) in Spark SQL and Python. We’ll cover topics like how to get started with DLT, how DLT tracks data dependencies in data pipelines, how to configure and run data pipelines using the Delta Live Tables UI, how to use Python or Spark SQL to define data pipelines that ingest and process data through multiple tables in the lakehouse using Auto Loader and DLT, how to use APPLY CHANGES INTO syntax to process Change Data Capture feeds, and how to review event logs and data artifacts created by pipelines and troubleshoot DLT syntax.
- Pre-requisites: Beginner familiarity with cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions