Spark’s support for efficient execution and rapid interactive prototyping enable novel approaches to understanding data-rich domains that have historically been underserved by analytical techniques. One such field is endurance sports, where athletes are faced with GPS and elevation traces as well as samples from heart rate, cadence, temperature, and wattage sensors. These data streams can be somewhat comprehensible at any given moment, when looking at a small window of samples on one’s watch or cycle computer, but are overwhelming in the aggregate.
In this talk, I’ll present my recent efforts using Spark and MLLib to mine my personal cycling training data for deeper insights and help me design workouts to meet particular fitness goals. This work incorporates analysis of geographic and time-series data, computational geometry, visualization, and domain knowledge of exercise physiology. I’ll show how Spark made this work possible, demonstrate some novel techniques for analyzing fitness data, and discuss how these approaches could be applied to make sense of data from an entire community of cyclists.
William Benton is passionate about making it easier for machine learning practitioners to benefit from advanced infrastructure and making it possible for organizations to manage machine learning systems. His recent roles have included defining product strategy and professional services offerings related to data science and machine learning, leading teams of data scientists and engineers, and contributing to many open source communities related to data, ML, and distributed systems. Will was an early advocate of building machine learning systems on Kubernetes and developed and popularized the “intelligent applications” idiom for machine learning systems in the cloud. He has also conducted research and development related to static program analysis, language runtimes, cluster configuration management, and music technology.