Session

AI is a Data Engineering Problem

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology
TechnologiesAI/BI, Unity Catalog
Skill LevelIntermediate
AI agents produce orders of magnitude more text data than traditional systems — conversation logs, agent traces, tool calls, and coding assistant sessions. One company running agents can generate 70 Wikipedias of text per month, and yet nobody knows how to make sense of it. Warehouses query numeric columns, not multi-turn conversations. Dashboards show averages while model failures hide in message content.My hypothesis: making sense of AI data is a data engineering problem, not an AI modeling problem. We need new tools that enable customers to store, explore, and curate AI data while scaling to trillions of messages.I'll demo live: a browser-native stack that reads Parquet and Iceberg from object storage, no backend. Range requests fetch only bytes needed; client-side decoding via hyparquet and Icebird, our open-source Iceberg reader. Navigate table snapshots, scroll millions of LLM rows, surface failures with LLM-as-a-judge, and curate data — all in the browser.

Session Speakers

Kenny Daniel

/CEO
Hyperparam