Raghav Karnam leads ML Engineering at Plume, serving the models which impact 500 Million plus devices on our network. His team is focused on all aspects of ML
1) Infrastructure & Tooling for ML
2) Data gathering (labeling, tooling)
3) ML neural network training
4) The science of making it work (e.g. Devicetyping, Anomaly detection, temporal prediction)
5) ML deployment in production running on scale
May 26, 2021 04:25 PM PT
We started out processing big data using AWS S3, EMR clusters, and Athena to serve Analytics data extracts to Tableau BI.
However as our data and teams sizes increased, Avro schemas from source data evolved, and we attempted to serve analytics data through Web apps, we hit a number of limitations in the AWS EMR, Glue/Athena approach.
This is a story of how we scaled out our data processing and boosted team productivity to meet our current demand for insights from 20M+ Smart Homes and 500M+ devices across the globe, from numerous internal business teams and our 150+ CSP partners.
We will describe lessons learnt and best practices established as we enabled our teams with DataBricks autoscaling Job clusters and Notebooks and migrated our Avro/Parquet data to use MetaStore, SQL Endpoints and SQLA Console, while charting the path to the Delta lake...