Deep Dive Into Streaming and Batch ETLs With Lakeflow Spark Declarative Pipelines
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Enterprise Technology, Consulting & Services |
| Technologies | Lakeflow, Unity Catalog |
| Skill Level | Advanced |
Let's deep dive into the new declarative ETL pipeline framework in Apache Spark™, Lakeflow Spark Declarative Pipelines (SDP). Let's peel it off to learn how SDP's high-level Python and SQL abstractions translate into lower-level Spark SQL and Spark Structured Streaming queries. During the talk, you will learn how SDP automatically resolves complex dependencies and builds optimized Directed Acyclic Graphs (DAGs) for both batch and streaming workloads. We will walk through the internal state management and orchestration logic that allows SDP to handle retries and incremental processing out of the box, replacing thousands of lines of imperative "glue code." You will leave with a clear mental model of the engine's architecture, the pros and cons of SDP, and be ready to debug and optimize your pipelines. You'll understand where and how to use it (alongside your existing ETL pipelines).
Session Speakers
Jacek Laskowski
/Freelance Data Engineer
books.japila.pl