From Spaghetti Bowl Pipeline to Lakeflow Declarative Pipelines Efficiency

Overview
Tuesday
June 10
12:20 pm
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Data Engineering and Streaming |
Industry | Health and Life Sciences |
Technologies | Databricks Workflows, DLT, Unity Catalog |
Skill Level | Beginner |
Duration | 20 min |
In today's data-driven world, the ability to efficiently manage and transform data is crucial for any organization. This presentation will explore the process of converting a complex and messy workflow into a clean and simple Lakeflow Declarative Pipelines at a large integrated health system, Intermountain Health.Alteryx is a powerful tool for data preparation and blending, but as workflows grow in complexity, they can become difficult to manage and maintain. Lakeflow Declarative Pipelines, on the other hand, offers a more democratized, streamlined and scalable approach to data engineering, leveraging the power of Apache Spark and Delta Lake.We will begin by examining a typical legacy workflow, identifying common pain points such as tangled logic, performance bottlenecks and maintenance challenges. Next, we will demonstrate how to translate this workflow into a Lakeflow Declarative Pipelines, highlighting key steps such as data transformation, validation and delivery.
Session Speakers
Peter Jones
/Analytics Engineer
Intermountain Healthcare