Session

1000× Faster Retrieval: Indexed and Full-Text Search in the Lakehouse

Overview

ExperienceIn Person
TrackCybersecurity
IndustryEnterprise Technology, Consulting & Services, Financial Services
TechnologiesUnity Catalog
Skill LevelIntermediate

Delta Lake and Iceberg excel at large-scale analytics, but they are not optimized for sub-second point lookups or full-text search. In operational workloads—such as security analytics, log investigation, and telemetry—retrieval queries that require full scans or high-cardinality filtering can take minutes to hours, when seconds or milliseconds are required.

I built IndexTables, an embedded, Tantivy-based indexing layer that runs directly inside Spark executors, enabling up to 1000× faster query performance while preserving the Lakehouse model. This session explores the architecture: object-storage-hosted indexes with ACID transactions, millisecond-latency aggregations over billions of rows, native time-series bucketing for efficient GROUP BY analytics, and NVMe-backed caching with proactive pre-warming.

Attendees will learn how Capital One is using IndexTables to complement Delta and Iceberg, creating Lakehouse architectures that support both analytical and retrieval-heavy workloads.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Scott Schenkein

/VP, Distinguished Engineer
Capital One Financial