HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
SAN FRANCISCO + VIRTUAL
Attend Live

Building Metadata and Lineage Driven Pipelines on Kubernetes

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Science, Machine Learning and MLOps

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 156

Duration

  • 35 min
Download session slides

Overview

Machine Learning becomes a critical role in every industry amid its widespread adoption. Composing an ML pipeline at a rapid pace is an inevitable way for success. However, an ML pipeline consists of several components and needs various efforts of different teams, including data engineers, data scientists, ML engineers, etc. A typical cooperation strategy is to define a sequence of tasks, coordinate the integration, test, apply fixes and enhancements, and repeat. ML pipeline components produced by task-driven approach lack reusability only maintenance efforts. Kubeflow Pipelines, a platform making deployments of ML pipeline on Kubernetes straightforward and scalable, provides metadata and lineage-driven approach to develop platform-independent and portable ML pipelines. Data linkage and propagation become crystal clear within ML pipelines. It also nourishes ML pipeline composition.

In this talk, we will introduce the Intermediate Representation(IR) feature in Kubeflow v2, including the specification, Python SDK, and backend architecture improvements. Using IR to compose the ML pipeline allows users to share, reuse the components, and increase the development pace. It makes the ML Ecosystem richer and platform agnostic. A comprehensive ML pipeline can be visualized and managed without knowing the underlying processing logic.

Session Speakers

Headshot of Tommy Li

Tommy Li

Senior Software Developer

IBM

Headshot of YI-HONG WANG

YI-HONG WANG

Software Developer

IBM

See the best of Data+AI Summit

Watch on demand