Session

Cache Smarter, Not Harder: Building a Semantic Cache Gateway with Lakebase and MLflow

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology
TechnologiesAgent Bricks, Lakebase
Skill LevelIntermediate
As LLM applications move into production, redundant calls quietly drive up latency, cost, and complexity. Traditional caching falls short — exact string matches miss the majority of real-world queries where users ask the same question in different ways.In this session, we introduce semantic caching: a smarter approach that retrieves responses based on intent, not syntax. You'll learn how to build a production-ready semantic cache gateway using Lakebase and MLflow to capture paraphrased and context-aware queries. We'll walk through why exact-match caching breaks down at scale, how to design a low-latency caching layer with a FastAPI gateway on Databricks Apps, and how to incorporate conversational context for multi-turn accuracy — plus the observability and governance patterns that make it production-safe.Walk away with a deployable architecture and proven patterns to cut LLM costs by up to 80% while improving response speed and reliability.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Brian Law

/Sr. Specialist Solutions Architect
Databricks

Speaker placeholderIMAGE COMING SOON

Ananya Roy

/Snr Specialist Solution Architect
Databricks