Efficient and Multi-Tenant Scheduling of Big Data and AI Workloads
- MLOps and DataOps
- Moscone South | Upper Mezzanine | 160
- 35 min
Many ML and big data teams in the open source community are looking to run their workloads in the cloud and they invariably face a common set of challenges such as multi-tenant cluster management, resource fairness and sharing, gang scheduling and cost-effective infrastructure operations. Kubernetes is the de-facto standard platform for running containerized applications in the cloud. However, the default resource scheduler in Kubernetes leaves more to be desired for AI scenarios when running ML/DL training workloads or large-scale data processing jobs for feature engineering. In this talk, we will share how the community leverage and build upon Apache YuniKorn to address the unique resource scheduling needs for ML and big data teams.