Data Ingestion with Lakeflow Connect
Overview
Experience | In Person |
---|---|
Type | Paid Training |
Duration | 240 min |
In this course, you’ll learn how to have efficient data ingestion with Lakeflow Connect and manage that data. Topics include ingestion with built-in connectors for SaaS applications, databases and file sources, as well as ingestion from cloud object storage, and batch and streaming ingestion. We'll cover the new connector components, setting up the pipeline, validating the source and mapping to the destination for each type of connector. We'll also cover how to ingest data with Batch to Streaming ingestion into Delta tables, using the UI with Auto Loader, automating ETL with DLT or using the API.
This will prepare you to deliver the high-quality, timely data required for AI-driven applications by enabling scalable, reliable, and real-time data ingestion pipelines. Whether you're supporting ML model training or powering real-time AI insights, these ingestion workflows form a critical foundation for successful AI implementation.
Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.
Labs: No
Certification Path: Databricks Certified Data Engineer Associate