SESSION

Optimal LLM Batch Inference

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKGenerative AI
INDUSTRYHealth and Life Sciences
TECHNOLOGIESDatabricks Experience (DBX), AI/Machine Learning, ETL, MLFlow
SKILL LEVELIntermediate
DURATION20 min

Loading LLMs on a GPU cluster is a very resource intense task and if not done right, might results in the infamous CUDA Out of Memory error. This talk explains the right way to structure batch jobs and explains how to do cluster sizing, batch sizing and tune other parameters to have best performance for your AI powered applications built on the Databricks Data Intelligence platform.

SESSION SPEAKERS

IMAGE COMING SOON

Srijit Chandrashekhar Nair

/Sr Specialist Solutions Architect
Databricks