HomepageData + AI Summit 2023 Logo
SAN FRANCISCO, JUNE 26-29
VIRTUAL, JUNE 28-29
  • Sessions
Watch on demand

Cross-Platform Data Lineage with OpenLineage

Wednesday, June 28 @1:30 PM
Attending in person? Add to your schedule ↗

Overview

There are more data tools available than ever before, and it is easier to build a pipeline than it has ever been. These tools and advancements have created an explosion of innovation, resulting in data within today's organizations becoming increasingly distributed and can't be contained within a single brain, a single team, or a single platform. Data lineage can help by tracing the relationships between datasets and providing a map of your entire data universe.



 



OpenLineage provides a standard for lineage collection that spans multiple platforms, including Apache Airflow, Apache Spark™, Flink®, and dbt. This empowers teams to diagnose and address widespread data quality and efficiency issues in real time. In this session, we will show how to trace data lineage across Apache Spark and Apache Airflow. There will be a walk-through of the OpenLineage architecture and a live demo of a running pipeline with real-time data lineage.


Type

  • Breakout

Experience

  • In Person

Track

  • Data Governance

Industry

  • Enterprise Technology, Financial Services

Difficulty

  • Intermediate

Duration

  • 40 min
Download session slides

Session Speakers

Headshot of Julien Le Dem

Julien Le Dem

Chief Architect

Astronomer

Headshot of Willy Lulciuc

Willy Lulciuc

Sr. Software Engineer

Astronomer

Don't miss this year's event!

Register now