Building Metadata and Lineage Driven Pipelines on Kubernetes
- 데이터 사이언스, 머신 러닝 및 MLOps
- Moscone South | Upper Mezzanine | 156
- 35 min
Machine Learning becomes a critical role in every industry amid its widespread adoption. Composing an ML pipeline at a rapid pace is an inevitable way for success. However, an ML pipeline consists of several components and needs various efforts of different teams, including data engineers, data scientists, ML engineers, etc. A typical cooperation strategy is to define a sequence of tasks, coordinate the integration, test, apply fixes and enhancements, and repeat. ML pipeline components produced by task-driven approach lack reusability only maintenance efforts. Kubeflow Pipelines, a platform making deployments of ML pipeline on Kubernetes straightforward and scalable, provides metadata and lineage-driven approach to develop platform-independent and portable ML pipelines. Data linkage and propagation become crystal clear within ML pipelines. It also nourishes ML pipeline composition.
In this talk, we will introduce the Intermediate Representation(IR) feature in Kubeflow v2, including the specification, Python SDK, and backend architecture improvements. Using IR to compose the ML pipeline allows users to share, reuse the components, and increase the development pace. It makes the ML Ecosystem richer and platform agnostic. A comprehensive ML pipeline can be visualized and managed without knowing the underlying processing logic.