홈페이지Data + AI Summit 2022 로고
Watch on demand

Connecting the Dots with DataHub: Lakehouse and Beyond

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • 데이터 엔지니어링

Room

  • Moscone South | Level 2 | 202

Duration

  • 35 min
Download session slides

개요

You’ve successfully built your data lakehouse. Congratulations! But what happens when your operational data stores, streaming systems like Apache Kafka or data ingestion systems produce bad data into the lakehouse? Can you be proactive when it comes to preventing bad data from affecting your business? How can you take advantage of automation to ensure that raw data assets become well maintained data products (clear ownership, documentation and sensitivity classification) without requiring people to do redundant work across operational, ingestion and lakehouse systems? How do you get live and historical visibility into your entire data ecosystem (schemas, pipelines, data lineage, models, features and dashboards) within and across your production services, ingestion pipelines and Data Lakehouse? Data engineers struggle with data quality and data governance issues constantly interrupting their day and limiting their upside impact on the business.

In this talk, we will share how data engineers from our 3K+ strong DataHub community are using DataHub to track lineage, understand data quality, and prevent failures from impacting their important dashboards, ML models and features. The talk will include details of how DataHub extracts lineage automatically from Spark, schema and statistics from Delta Lake and shift-left strategies for developer-led governance.

Session Speakers

Headshot of Shirshanka Das

Shirshanka Das

CEO and Co-Founder

Acryl Data

Data+AI Summit 하이라이트 보기

Watch on demand