Session

AI is a Data Engineering Problem

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Enterprise Technology
Technologies	AI/BI, Unity Catalog
Skill Level	Intermediate

AI agents produce orders of magnitude more text data than traditional systems — conversation logs, agent traces, tool calls and coding assistant sessions. One company running agents can generate 70 Wikipedias of text per month, and yet nobody knows how to make sense of it. Warehouses query numeric columns, not multi-turn conversations. Dashboards show averages while model failures hide in message content.

My hypothesis: making sense of AI data is a data engineering problem, not an AI modeling problem. We need new tools that enable customers to store, explore and curate AI data while scaling to trillions of messages.

I’ll demo live: a browser-native stack that reads Parquet and Iceberg from object storage, no backend. Range requests fetch only bytes needed; client-side decoding via hyparquet and Icebird, our open-source Iceberg reader. Navigate table snapshots, scroll millions of LLM rows, surface failures with LLM-as-a-judge and curate data — all in the browser.

Session Speakers

Kenny Daniel

/CEO
Hyperparam