SESSION

Sponsored by: lakeFS | Why Version Control is Essential for Your Lakehouse Architecture

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Lakehouse Architecture
INDUSTRYEnterprise Technology
TECHNOLOGIESApache Spark, Delta Lake
SKILL LEVELIntermediate
DURATION20 min
DOWNLOAD SESSION SLIDES

When developing and maintaining data/ML pipelines using DataBricks we tend to adopt practices that improve the quality and velocity of code development and deployment. How do we do the same for the data that is the basis for our data products? We must be able to experiment during development, test data quality in isolation and automate quality validation tests, work with full reproducibility of data pipelines, and more. If your product’s value is derived from data in the shape of analytics or machine learning, poor data quality, or lack of reproducibility of data+code, can easily translate into pain. In this session, you will discover how to implement engineering best practices to data products using data version control using lakeFS.

SESSION SPEAKERS

Oz Katz

/CTO & Co-creator of lakeFS
lakeFS