Intelligent Document Processing with Lakeflow: Transforming Unstructured Data for Agents and Analytics
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Healthcare & Life Sciences, Manufacturing, Financial Services |
| Technologies | Lakeflow, Databricks Agents |
| Skill Level | Intermediate |
A huge portion of the data that runs your business doesn't live in clean tables — it lives in millions of documents: invoices, contracts, forms, and filings. This is some of the richest context your enterprise has, and today's agents finally have a shot at unlocking it. But there's a catch: agents can't actually read your documents. On our OfficeQA benchmark, even frontier agents score below 50% on real enterprise document tasks because they struggle to read them. Close that gap, and every agent, dashboard, and decision downstream gets sharper.
In this session, we'll show how Document Intelligence on Databricks turns unstructured documents into accurate, governed insights that power more accurate agents and analytics. We'll walk through how to build intelligent document processing pipelines with Lakeflow and AI Functions purpose-built for document understanding — taking your documents from raw PDF to rich answers in Genie.
Walk away knowing how to turn your document corpus into the data layer that powers smarter, faster, and more trustworthy agents and analytics.
Session Speakers
Archika Dogra
/Product Manager
Databricks
Jason Ping
/Product Manager
Databricks