Thinking Fast & Slow: How Databricks Built High-Speed and Deep Research Agents
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology |
| Technologies | Unity Catalog, Agent Bricks |
| Skill Level | Advanced |
In today's competitive agentic search landscape, two operating modes are mission-critical. The first is a low-latency, low-cost mode for consumer-facing scale. It must meet strict tail-latency budgets for real-time, high-throughput use cases at minimal per-query cost, without sacrificing quality.
The second is a compute-intensive deep research mode enabling expert-level analysis—from financial due diligence and technology mapping to clinical review and manufacturing diagnostics. The system must plan multi-step retrieval, triangulate sources, and synthesize coherent answers.
Using our Instructed Retriever and Aroll frameworks, we built a unified agentic harness supporting both modes in a single architecture. Our system sits at the Pareto frontier of cost, speed, and quality. Instant mode delivers single-digit second latency maintaining accuracy, while Thinking mode achieves SoTA-level performance across enterprise domains.
Session Speakers
Michael Bendersky
/Director (Research)
Databricks