Databricks at NeurIPS 2025

Databricks is proud to be a platinum sponsor of NeurIPS 2025. The conference runs from December 2 to 7 in San Diego, California.

Visit our Booth

Stop by booth #1619 in the Expo Hall from December 2 to 5 to meet members of our research, engineering, and recruiting teams and learn about our latest work and open roles.

Databricks at NeurIPS

Poster Session

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Sam Havens, Michael Carbin, Andrew Drozdov, Nandan Thakur

FreshStack is a new, end-to-end framework for automatically generating modern, realistic information retrieval benchmarks. It builds evaluation datasets by collecting up-to-date technical corpora, extracting fine-grained nuggets from real community Q&A, and testing retrieval quality using a fusion of retrieval methods. Across five fast-moving technical domains, baseline retrieval models perform far below oracle systems—revealing substantial headroom for improving IR and RAG pipelines. FreshStack also uncovers cases where reranking provides no lift and where oracle context dramatically boosts LLM answer quality.

Workshop

Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs

Jacob Portes, Connor Jennings, Erica Ji Yuen, Sasha Doubov, Michael Carbin

This work examines how retrieval quality improves as LLMs scale in size, training duration, and total pretraining FLOPs. Evaluating models from 125M to 7B parameters trained on 1B–2T tokens, the study shows that zero-shot BEIR retrieval performance follows predictable scaling trends: larger and longer-trained models retrieve better. The results also reveal a strong correlation between retrieval accuracy and in-context learning capabilities, suggesting shared underlying mechanisms. These findings offer important guidance for designing and training next-generation LLM-based retrievers. Readers can explore the full paper for deeper insights and methodology.

Sponsor Talk

Databricks presents a new IDP Benchmark

Erich Elsen

Exhibit Hall A/B, Tue 2 Dec 5:15 p.m. PST — 5:27 p.m. PST

Most business documents still exist for humans first and machines second. One of our goals at Databricks is to make this human-centered data "legible" to AI and Agents, so that we gain insights and even take actions based upon those insights. But AI can still struggle to understand the full range of messy unstructured documents we produce for each other. We've created and will present a benchmark, OfficeQA, that probes the limits of current AI systems in analyzing a large (89,000 page) public dataset.

Networking Event

Join us for an evening of connections, conversations, and community during NeurIPS 2025. Over drinks and appetizers, connect with fellow attendees while meeting our Research and Engineering teams. Register here!

Please note that, given limited capacity, guest registrations will be placed on a waitlist and approved on a rolling basis. Thank you for your patience and understanding.