HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
SAN FRANCISCO + VIRTUAL
Attend Live

Spline: Central Data-Lineage Tracking, Not Only For Spark

On Demand

Type

  • Session

Format

  • Virtual

Track

  • MLOps and DataOps

Difficulty

  • Intermediate

Duration

  • 40 min
Download session slides

Overview

Data lineage tracking continues to be a major problem for many organizations. The variety of data tools and frameworks used in big companies’ and a lack of standards and universal lineage tracking solutions (especially open-source ones) makes it very difficult or sometimes even impossible to reliably track and visualize dataflows end to end. Spline is one of a very few open-source solutions available nowadays that tries to address that problem. Spline has started as a data-lineage tracking tool for Apache Spark. But now it offers a generic API and model that is capable to aggregate lineage metadata gathered from different data tools, wire it all together, providing a full end-to-end representation of how the data flows through the pipelines, and how it transforms along the way.
In this presentation we will explain how Spline can be used as a central data-lineage tracking tool for the organization. We’ll briefly cover the high-level architecture and design ideas, outline challenges and limitations of the current solution, and talk about deployment options. We’ll also talk about how Spline compares to some other open-source tools, and how OpenLineage standard can be leveraged to integrate with them.

Session Speakers

Headshot of Oleksandr Vayda

Oleksandr Vayda

Senior software engineer

ABSA

Headshot of Danil Vagapov

Danil Vagapov

Staff Site Reliability Engineer

Outreach

See the best of Data+AI Summit

Watch on demand