SESSION

Fabricator: Streamlining Declarative Feature Engineering at DoorDash

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Science and Machine Learning
INDUSTRYEnterprise Technology, Retail and CPG - Food
TECHNOLOGIESAI/Machine Learning, Apache Spark, Delta Lake
SKILL LEVELIntermediate
DURATION40 min

Feature engineering, a crucial aspect of machine learning, presents unique challenges compared to general data engineering. We developed Fabricator, a comprehensive framework to streamline declarative data pipelines for machine learning at DoorDash. Fabricator efficiently orchestrates 1400 daily batch jobs, managing 2.2 trillion feature values across all business verticals. With a job registry, a library for large-scale data ELT jobs, and an orchestration and execution service, Fabricator offers numerous advantages. It streamlines feature development with a declarative feature DSL and centralized repository, accelerates data fabrication using a high-level SDK, mitigates latency and consistency discrepancies between offline and online feature data, and automates operational tasks like batch ETL jobs, feature uploads, and real-time feature computation. We will discuss how we leveraged Databricks Jobs and Delta Lake in Fabricator’s construction and share what we learned.

SESSION SPEAKERS

IMAGE COMING SOON

Kunal Shah

/Software Engineer
Doordash

Hebo Yang

/ML Infra Eng
DoorDash