Session

From "Hidden Costs" to "High Efficiency": Scaling DV's Lakehouse Observability

Overview

ExperienceIn Person
TrackGovernance & Security
IndustryCommunications - Media & Entertainment
TechnologiesUnity Catalog
Skill LevelIntermediate

A practical guide to the internal monitoring tools we built on Databricks System Tables to make our data platform more efficient and performant. Hear directly from our Data Platform Engineering team about what worked for us and what we learned along the way

We’ll dive into the backend strategies that transformed our platform, including:

  • Dashboards We Actually Look At: Tracking real costs and table/column sizes.
  • Practical Alerts for Real Problems: Identifying costly unused tables, tables with missing or excessive retention policies, long-running queries, and constantly failing jobs.
  • The "Orphan" Hunt: Programmatically identifying unreferenced files and un-vacuumed data by auditing Delta logs against cloud storage.
  • The Cost/Read Metric: Using recursive lineage queries to expose expensive, high-maintenance tables with little or no downstream value.
  • Footer Analysis: Scanning Parquet footers at scale to isolate metadata bloat from actual data —even in complex nested types.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Saul Tawil

/Senior Data Engineer
DoubleVerify