Robin is a PM at Intuit, makers of Turbotax, Mint, and Quickbooks. His team is helping to build a data mesh to organize and make accessible data that has been collected over 4 decades across multiple businesses. In a previous life, he worked on data science platform Domino Data Lab, applied machine learning to satellite imagery at Planet, and co-founded GlobalForestWatch.org, which uses satellite imagery to help people around the world protect forests. In his spare time, Robin tries to keep two toddlers alive and squeeze in a bit of reading and cycling.
May 27, 2021 03:50 PM PT
At Intuit, we have a lot of data - and a lot of duplicate data collected over decades. So we built a rule-based, self-serve tool to identify and merge duplicate records. It takes experimentation and iteration to get deduplication just right for 100s of millions of records, and spreadsheet-based tracking just wasn't enough. We now use MLflow to automatically capture execution notes, rule settings, weights, key validation metrics, etc., all without requiring end-user action. In this talk, we'll talk about our use case and why MLflow is useful outside its traditional ML Ops use cases.