From Spaghetti Bowl Pipeline to Lakeflow Declarative Pipelines Efficiency

Overview
Tuesday
June 10
12:20 pm
| Experience | In Person |
|---|---|
| Type | Lightning Talk |
| Track | Data Engineering and Streaming |
| Industry | Health and Life Sciences |
| Technologies | Databricks Workflows, DLT, Unity Catalog |
| Skill Level | Beginner |
| Duration | 20 min |
In today's data-driven world, the ability to efficiently manage and transform data is crucial for any organization. This presentation will explore the process of converting a complex and messy workflow into a clean and simple Lakeflow Declarative Pipelines at a large integrated health system, Intermountain Health.Alteryx is a powerful tool for data preparation and blending, but as workflows grow in complexity, they can become difficult to manage and maintain. Lakeflow Declarative Pipelines, on the other hand, offers a more democratized, streamlined and scalable approach to data engineering, leveraging the power of Apache Spark and Delta Lake.We will begin by examining a typical legacy workflow, identifying common pain points such as tangled logic, performance bottlenecks and maintenance challenges. Next, we will demonstrate how to translate this workflow into a Lakeflow Declarative Pipelines, highlighting key steps such as data transformation, validation and delivery.
Session Speakers
Peter Jones
/Analytics Engineer
Intermountain Healthcare