How Azure Databricks and PySpark Helped Make IOT Analytics a Reality

Download Slides

At Lennox International, we have thousands of IoT connected devices streaming data into the Azure platform with a minute level polling interval. The challenge was to use these data sets, combine with external data sources such as weather, and predict equipment failure with high levels of accuracy along with their influencing patterns and parameters. Previously the team was using a combination of on-premise and desktop tools to run algorithms on a sample set of devices. The result was low accuracy levels (around 65%) on a process that took more than 6 hours.

The team had to work through several data orchestration challenges and identify a machine learning platform which enabled them to collaborate between our engineering SME’s, Data Engineers and Data Scientists. The team decided to use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark. To enhance the sophistication of the learning, the team worked on a variety of Spark ML models such as Gradient Boosted Trees and Random Forest. The team also implemented stacking, ensemble methods using H2O driverless AI and sparkling water on Azure Databricks clusters, which can scale up to 1000 cores.

Join us in this session and see how this resulted in models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.

Session hashtag: #Ent7SAIS

« back
About Janath Manohararaj

Janath Manohararaj is a Principal Big Data Architect with Lennox International. In his current role, Janath drives Big Data platform architecture & design to create a hybrid eco-system that enables Data Science and BI teams to gain insights and deliver operational excellence. Prior to this role, Janath has worked extensively on data engineering which includes optimizing and tuning performance of Apache Spark in a wide variety of projects. He has over 10+ years of experience working on massively parallel processing and analytical systems. Janath holds a Masters in Computer Science from Portland State University.

About Prasad Chandravihar

Prasad Chandravihar is a Lead Data Scientist at Lennox International. He has been with Lennox for the past 5 years and has worked on data science projects across various domains such as Supply Chain, Marketing, Tech Support and Engineering. In his current capacity he leads the machine learning team focused on executing strategic initiatives across different verticals. Prasad has completed an executive program on Artificial Intelligence from Massachusetts Institute of Technology - Sloan School of Management and holds a bachelor's degree in Computer science.