Session
Production Patterns for Multi-Tenant AI: Scaling to 800 Banks on Databricks
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology, Financial Services |
| Technologies | Unity Catalog, Lakebase |
| Skill Level | Intermediate |
Serving 800 German banks on one AI platform required solving a fundamental multi-tenant challenge: maintaining strict data isolation while scaling to hundreds of QPS under banking regulations. We present two innovations:1. Metadata-driven capacity management: Delta tables track index capacity per endpoint in real-time. When thresholds are reached, the system provisions new Vector Search endpoints and rebalances indexes. This eliminates manual scaling interventions and provides a reusable pattern for any platform hitting infrastructure limits at scale.2. Hierarchical retrieval with RAPTOR: Plain Agents retrieve flat chunks. We implement recursive clustering and summarization to build tree structures, enabling retrieval across abstraction levels, from granular details to high-level themes. This approach significantly improves answer quality for complex financial documents.Takeaways: federated vector search patterns, metadata-driven auto-scaling architecture, RAPTOR implementation
Session Speakers
Natasha Ueberschlag
/Manager, AI Forward Deployed Engineering
Databricks
Simon Schmitz
/Senior Data Scientist
Atruvia