Spline: Central Data-Lineage Tracking, Not Only For Spark
On Demand
Type
- Session
Format
- Virtual
Track
- MLOps and DataOps
Difficulty
- Intermediate
Duration
- 40 min
Overview
Data lineage tracking continues to be a major problem for many organizations. The variety of data tools and frameworks used in big companies’ and a lack of standards and universal lineage tracking solutions (especially open-source ones) makes it very difficult or sometimes even impossible to reliably track and visualize dataflows end to end. Spline is one of a very few open-source solutions available nowadays that tries to address that problem. Spline has started as a data-lineage tracking tool for Apache Spark. But now it offers a generic API and model that is capable to aggregate lineage metadata gathered from different data tools, wire it all together, providing a full end-to-end representation of how the data flows through the pipelines, and how it transforms along the way.
In this presentation we will explain how Spline can be used as a central data-lineage tracking tool for the organization. We’ll briefly cover the high-level architecture and design ideas, outline challenges and limitations of the current solution, and talk about deployment options. We’ll also talk about how Spline compares to some other open-source tools, and how OpenLineage standard can be leveraged to integrate with them.
In this presentation we will explain how Spline can be used as a central data-lineage tracking tool for the organization. We’ll briefly cover the high-level architecture and design ideas, outline challenges and limitations of the current solution, and talk about deployment options. We’ll also talk about how Spline compares to some other open-source tools, and how OpenLineage standard can be leveraged to integrate with them.
See the best of Data+AI Summit
Watch on demand