dbt + Machine Learning: What Makes a Great Baton Pass?
- Moscone South | Upper Mezzanine | 156
- 35 min
This presentation will be divided into several questions that flow from current state discovery to future state possibility. The goal of this isn’t to choose a winning workflow and/or tool. It’s to broaden and deepen the audience’s imagination for what a dbt + machine learning workflow can become. It’s to minimize the territorialism that can come with shared data workflows and motivate people to see what matters: people working well together to make effective, evidence-based decisions.
- **What does a baton pass look like today?**
- I’ll illustrate a realistic scenario with a live demo and/or already prepared code snippets
- We’ll then interrogate this workflow story with questions that start flowing as a result of what’s going well vs. not
- “How do we prove this machine learning model is predicting correct results?”
- “How do we elegantly track the machine learning model’s performance?”
- “What happens if the table structure changes?”
- “How do we track data lineage from dbt to jupyter notebook operations to api deployment?”
- **Core behaviors and outcomes**
- I’ll parse out fundamental themes illustrated in the above story. For example, when things go wrong in production, who fixes what? Tracking KPIs over time. Data lineage maintenance.
- **What’s being done about this today?**
- I’ll go through quick, live demos of categorical approaches. Not all tools will be demoed, only representatives from respective categories. Example categories below
- Python UDFs pushed down to the data warehouse/lakehouse compute that dbt can reference
- Bring ML modeling to SQL workflow
- Build a better notebook experience that make python and SQL more interoperable
- Make dbt integrate natively with multi-lingual support