JUNE 26-29, 2023
SAN FRANCISCO + VIRTUAL
지금 등록하기

Building Metadata and Lineage Driven Pipelines on Kubernetes

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • 데이터 사이언스, 머신 러닝 및 MLOps

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 156

Duration

  • 35 min
Download session slides

개요

Machine Learning becomes a critical role in every industry amid its widespread adoption. Composing an ML pipeline at a rapid pace is an inevitable way for success. However, an ML pipeline consists of several components and needs various efforts of different teams, including data engineers, data scientists, ML engineers, etc. A typical cooperation strategy is to define a sequence of tasks, coordinate the integration, test, apply fixes and enhancements, and repeat. ML pipeline components produced by task-driven approach lack reusability only maintenance efforts. Kubeflow Pipelines, a platform making deployments of ML pipeline on Kubernetes straightforward and scalable, provides metadata and lineage-driven approach to develop platform-independent and portable ML pipelines. Data linkage and propagation become crystal clear within ML pipelines. It also nourishes ML pipeline composition.

In this talk, we will introduce the Intermediate Representation(IR) feature in Kubeflow v2, including the specification, Python SDK, and backend architecture improvements. Using IR to compose the ML pipeline allows users to share, reuse the components, and increase the development pace. It makes the ML Ecosystem richer and platform agnostic. A comprehensive ML pipeline can be visualized and managed without knowing the underlying processing logic.

Session Speakers

Headshot of Tommy Li

Tommy Li

Senior Software Developer

IBM

Headshot of YI-HONG WANG

YI-HONG WANG

Software Developer

IBM

Data+AI Summit 하이라이트 보기

Watch on demand