Session

Scaling GenAI Inference From Prototype to Production: Real-World Lessons in Speed & Cost

Overview

ExperienceIn Person
TypeLightning Talk
TrackArtificial Intelligence
IndustryEducation, Media and Entertainment
TechnologiesDelta Lake, Data Marketplace, Databricks Workflows
Skill LevelIntermediate
Duration20 min

This lightning talk dives into real-world GenAI projects that scaled from prototype to production using Databricks’ fully managed tools. Facing cost and time constraints, we leveraged four key Databricks features—Workflows, Model Serving, Serverless Compute, and Notebooks—to build an AI inference pipeline processing millions of documents (text and audiobooks).

 

This approach enables rapid experimentation, easy tuning of GenAI prompts and compute settings, seamless data iteration and efficient quality testing—allowing Data Scientists and Engineers to collaborate effectively. Learn how to design modular, parameterized notebooks that run concurrently, manage dependencies and accelerate AI-driven insights.

 

Whether you're optimizing AI inference, automating complex data workflows or architecting next-gen serverless AI systems, this session delivers actionable strategies to maximize performance while keeping costs low.

Session Speakers

Anish Kumar

/Lead Engineer
Scribd