Session

1000× Faster Retrieval: Indexed and Full-Text Search in the Lakehouse

Overview

Experience	In Person
Track	Cybersecurity
Industry	Enterprise Technology, Consulting & Services, Financial Services
Technologies	Databricks Agents
Skill Level	Intermediate

Delta Lake and Iceberg excel at large-scale analytics, but they are not optimized for sub-second point lookups or full-text search. In operational workloads — such as security analytics, log investigation and telemetry — retrieval queries that require full scans or high-cardinality filtering can take minutes to hours, when seconds or milliseconds are required.

I built IndexTables, an embedded, Tantivy-based indexing layer that runs directly inside Spark executors, enabling up to 1000× faster query performance while preserving the Lakehouse model. This session explores the architecture: object-storage-hosted indexes with ACID transactions, millisecond-latency aggregations over billions of rows, native time-series bucketing for efficient GROUP BY analytics and NVMe-backed caching with proactive pre-warming.

Attendees will learn how Capital One is using IndexTables to complement Delta and Iceberg, creating lakehouse architectures that support both analytical and retrieval-heavy workloads.

Session Speakers

Scott Schenkein

/VP, Distinguished Engineer
Capital One Financial