Session
Cache Smarter, Not Harder: Building a Semantic Cache Gateway with Lakebase and MLflow
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology |
| Technologies | Agent Bricks, Lakebase |
| Skill Level | Intermediate |
As LLM applications move into production, redundant calls quietly drive up latency, cost, and complexity. Traditional caching falls short — exact string matches miss the majority of real-world queries where users ask the same question in different ways.In this session, we introduce semantic caching: a smarter approach that retrieves responses based on intent, not syntax. You'll learn how to build a production-ready semantic cache gateway using Lakebase and MLflow to capture paraphrased and context-aware queries. We'll walk through why exact-match caching breaks down at scale, how to design a low-latency caching layer with a FastAPI gateway on Databricks Apps, and how to incorporate conversational context for multi-turn accuracy — plus the observability and governance patterns that make it production-safe.Walk away with a deployable architecture and proven patterns to cut LLM costs by up to 80% while improving response speed and reliability.
Session Speakers
Brian Law
/Sr. Specialist Solutions Architect
Databricks
Ananya Roy
/Snr Specialist Solution Architect
Databricks