Session

From Spaghetti Bowl Pipeline to Lakeflow Declarative Pipelines Efficiency

Register or Login

Overview

Tuesday

June 10

12:20 pm

ExperienceIn Person
TypeLightning Talk
TrackData Engineering and Streaming
IndustryHealth and Life Sciences
TechnologiesDatabricks Workflows, DLT, Unity Catalog
Skill LevelBeginner
Duration20 min

In today's data-driven world, the ability to efficiently manage and transform data is crucial for any organization. This presentation will explore the process of converting a complex and messy workflow into a clean and simple Lakeflow Declarative Pipelines at a large integrated health system, Intermountain Health.Alteryx is a powerful tool for data preparation and blending, but as workflows grow in complexity, they can become difficult to manage and maintain. Lakeflow Declarative Pipelines, on the other hand, offers a more democratized, streamlined and scalable approach to data engineering, leveraging the power of Apache Spark and Delta Lake.We will begin by examining a typical legacy workflow, identifying common pain points such as tangled logic, performance bottlenecks and maintenance challenges. Next, we will demonstrate how to translate this workflow into a Lakeflow Declarative Pipelines, highlighting key steps such as data transformation, validation and delivery.

Session Speakers

Peter Jones

/Analytics Engineer
Intermountain Healthcare