Lessons Learned Developing and Managing High Volume Apache Spark Pipelines in Production

Download Slides

Quby is the creator and provider of Toon, a leading European smart home platform. We enable Toon users to control and monitor their homes using both an in-home display and app. As a data driven company, we use machine learning algorithms to generate actionable insights for our end users. We have developed data driven services to ensure that users do not needlessly waste energy and can receive real-time alerts about problems with their heating system.

In this talk, Erni will describe our journey of productionizing data science algorithms. We’ll take a deep dive into our pipeline and describe our streamlined development and deployment workflow. We’ll explain how we define and manage dependencies between jobs in multiple environments (test, acceptance and production) and schedule the pipeline computation. We’ll delve into scale challenges, metrics, monitoring and data quality. Also, we will reflect on the lessons learned while building high volume infrastructure that offers multiple data driven services to hundreds of thousands of users.

Session hashtag: #SAISML4



« back
About Erni Durdevic

Erni Durdevic is a Senior Machine Learning Engineer at Quby, a leading company offering data-driven home services technology, known for creating the in-home display and smart thermostat Toon. In this role, he is responsible for building end-to-end data science products. He enjoys pairing with Data Scientists and Data Engineers to transform proofs-of-concept into products running at scale. Erni has a Master degree in Computer Science Engineering and a passion for tackling the world's toughest problems using Data and AI.