HomepageData + AI Summit 2022 Logo
Watch on demand

Connecting the Dots with DataHub: Lakehouse and Beyond

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Engineering

Room

  • Moscone South | Level 2 | 202

Duration

  • 35 min
Download session slides

Overview

You’ve successfully built your data lakehouse. Congratulations! But what happens when your operational data stores, streaming systems like Apache Kafka or data ingestion systems produce bad data into the lakehouse? Can you be proactive when it comes to preventing bad data from affecting your business? How can you take advantage of automation to ensure that raw data assets become well maintained data products (clear ownership, documentation and sensitivity classification) without requiring people to do redundant work across operational, ingestion and lakehouse systems? How do you get live and historical visibility into your entire data ecosystem (schemas, pipelines, data lineage, models, features and dashboards) within and across your production services, ingestion pipelines and Data Lakehouse? Data engineers struggle with data quality and data governance issues constantly interrupting their day and limiting their upside impact on the business.

In this talk, we will share how data engineers from our 3K+ strong DataHub community are using DataHub to track lineage, understand data quality, and prevent failures from impacting their important dashboards, ML models and features. The talk will include details of how DataHub extracts lineage automatically from Spark, schema and statistics from Delta Lake and shift-left strategies for developer-led governance.

Session Speakers

Shirshanka Das

CEO and Co-Founder

Acryl Data

See the best of Data+AI Summit

Watch on demand