Maya is a software engineer at Intuit, her team focuses on big data solutions. Maya enjoys coding in various languages including python, Scala, node.js and GoLang. Prior to software engineering Maya has worked as a data scientist for multiple startup companies. Maya pursued her PhD (ABD) in Mechanical engineering focusing on computational models for renewable energy. In her spare time, Maya enjoys reading books and painting.
May 27, 2021 03:50 PM PT
At Intuit, we have a lot of data - and a lot of duplicate data collected over decades. So we built a rule-based, self-serve tool to identify and merge duplicate records. It takes experimentation and iteration to get deduplication just right for 100s of millions of records, and spreadsheet-based tracking just wasn't enough. We now use MLflow to automatically capture execution notes, rule settings, weights, key validation metrics, etc., all without requiring end-user action. In this talk, we'll talk about our use case and why MLflow is useful outside its traditional ML Ops use cases.