Session

From Unstructured Data to Structured Insights: How YipitData Scales Data Enrichment with AI Agents

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Enterprise Technology, Retail & Consumer Goods, Financial Services
Technologies	Databricks Apps, Databricks Agents, Lakebase
Skill Level	Intermediate

YipitData analyzes billions of unstructured data points at petabyte scale to deliver high fidelity insights to institutional investors and Fortune 500 companies. With 100s of heterogeneous data sources, regex, classic ML and NLP techniques never met our accuracy hurdle, limiting our product breadth for years.

In this session, we reframe entity resolution as an agentic problem and share our production-grade enrichment platform built on Apache Spark™, Agent Bricks, Vector Search and Lakebase. This AI-native architecture continuously discovers and tags data at 90%–95% accuracy, reliably covering 60,000+ companies—a 20x improvement.

Data/ML leaders, engineers and practitioners will learn:

Modular, pipeline design applicable to any classification scenario
Batch inference patterns in Spark that streamlines infrastructure
Techniques for continuous, low maintenance entity discovery

Join us and leave with a blueprint to turn enrichment bottlenecks into self-improving, AI pipelines.

From Unstructured Data to Structured Insights: How YipitData Scales Data Enrichment with AI Agents

Overview

Session Speakers

Anup Segu

Edward Goo