Season 1, Episode 5
Combining Machine Learning and MLflow with your Lakehouse
Quby aims to “outsmart energy” – making the world easier, more comfortable and more sustainable. They collect a huge amount of IoT sensor data and have recently moved from batch to real-time streaming processes for data collection and machine learning. For this episode, we chat with Ellissa Verseput, ML Engineer at Quby to discuss how Quby leverages ML to extract additional value from their data lake and how they manage this process.
Ellissa Verseput is a Data Scientist at Quby, a leading company offering data driven home services technology. In this role she is responsible for developing end-to-end data driven services to enable commodity suppliers such as utilities and insurance companies to play a dominant role in the home services domain. Thanks to her previous experience in software and data engineering, she enjoys robust productionizing and building bridges between data science and her colleagues from other disciplines at Quby. Ellissa has a master’s degree in Econometrics & Operations Research and has been working in the IT & data science field since 2016.
All right. Welcome to Data Brew by Databricks with Denny and Brooke. The series allows you to explore various topics in the data and AI community. Whether we’re talking about data engineering or data science, we will interview subject matter experts to dive deeper into these topics. And while we’re at it, we’ll be enjoying our morning brew. My name is Denny Lee and I’m a developer advocate here at Databricks.
And I’m Brooke Wenig, machine learning practice lead at Databricks. For this episode, we’d like to introduce Ellissa Verseput, machine learning engineer at Quby to discuss how Quby leverages machine learning to extract additional value from their data lake and how they manage this process. I’ve been working with Ellissa and the team from Quby for two years and I’m thrilled to have her on the show with us today. To start off, would you mind giving a bit of an introduction and also a little bit of a background as to what Quby is since I know not everybody in the States is familiar with Quby.
Sure. Thanks Brooke, and Denny. I’m very glad that you guys have me today. Yeah, let’s start with myself. I am Ellissa and I’m a machine learning engineer at Quby for two years now. And actually I came into the field of machine learning engineering via a bit of a side track. I study econometrics in the Netherlands, that’s where I live. That’s actually a major. That’s what I graduate in and after that got a bit of software engineering experience after which I found that I wanted to go back to my theoretical data knowledge and combine it with software engineering. And that’s how I ended up at Quby doing machine learning.
And then to what Quby actually is. Quby is actually a tech company in the Netherlands. It’s Netherlands based, but has a international orientation. We have lots of people from all over the world working for us in Amsterdam and we are in the smart energy domain. We do all kinds of things with IoT data coming from our own smart thermostat, which we have deployed in several countries in Europe. And is gathering a lot of energy data from all domestic households we’re connected to. And this data is basically, the start for the data team at Quby to make all kinds of nice data driven services, that help the people save energy and to use their energy more efficiently.
Could you talk a little bit more about how you’re helping people save energy? What kind of recommendations you’re doing? What kind of data analysis are you doing with all of this data from your IoT sensors?
Oh yeah, sure. Actually there’s a couple things we do. There’s the smart thermostat functionality. That’s basically the core functionality that we give to our end customers or end users. This you can think of is just starting from basic control algorithms to control the heating in the house, but also from there, giving nice recommendations and basically suggesting updates to your current settings on your device to make sure that for the user’s behavior, it’s controlling the heat in the best way possible without waste.
You can think of that as the following. User is sometimes at home and requires heat. Sometimes he is not at home or is asleep and doesn’t need heat. If we can somehow help him to match that behavior to how you control the heat in the house, then you have a perfect match in a sense because you’re only using heat only using energy when you really need to and you’re not using energy when you don’t need to, so then it’s a waste. That’s one thing that we really try to help the users with by basically leveraging the data and seeing patterns and giving them back to the user in a friendly way.
Other things are actually more in the domain of electricity usage in general because well, a smart thermostat is all about controlling the heat and at least in Europe and the Netherlands that heat is generated by using gas, but also electricity consumption is a huge thing in domestic homes. Also there, we try to help use out by giving them basically feedback about how they’re using their appliances. We can actually see that with the data that the smart thermostat also gathers and we try to help them to basically get more insights in how to reduce particular energy usages.
Seems to be that it’s a lot of data that you’re potentially processing here. When you built up this data platform, what tools did you start off with? Did you start by using, for example, Apache Spark to build your data lakes to store all this IoT data?
Yeah. Good question. To be honest, when I started at Quby, that was two years ago, but then actually Apache Spark and also Databricks were already kind of kicking ass so I had a very warm welcome in that sense in the data team, but I’ve been talking to some colleagues and I know how it was before and it was actually kind of a cool story. It’s almost like from the lean book, so to say, because four to five years ago, Quby was actually mostly a hardware company. We had the smart thermostat, we were out there already at quite a few houses, at least in the Netherlands, but we were not really using the data yet to the next level, so to say.
What happened then is actually there was a team was assigned actually to see what we could do with that data beyond just controlling the heat, beyond to control algorithms that were just running on the smart thermostat itself, which is just basically one Linux kernel that just runs a simple control algorithms, so to say. What happened then is a lot of experiments, a lot of testing, a lot of looking what’s feasible and what’s not with all that gas and electricity data that was actually already collected. Was collected in a central place so we could actually do something with it. And once actually the first pilots, the first proof of concepts were ready and were proving value for the users, only then, something more robust was needed. Something like Apache Spark to scale up because we were talking about 300,000 users that we wanted to serve with this new data driven services, so to say.
That was actually when we got in some more machine learning, data engineering skilled people and also that was a time when we got started with Apache Spark and especially Databricks, quite soon, basically to transform the proof of concept to something that we could serve to all these users. That is actually, when I think of that, I’m quite proud also of the history of Quby that they did that in such a lean way, going from a proof of concept towards scaling up and in that process, basically setting up a platform that enabled us afterwards to do more and more and more because that was just the beginning actually. After that, it became only easier and easier and easier to do the next data driven service and the next and the next. And that’s pretty cool, I would say.
Well then related to that, because of all this exponential growth, you’re talking about IoT data. What were some of the implications of dealing with, not just with batch data, storing these data lakes, but also streaming data?
Yeah. That’s definitely something we needed to give attention to, especially I think about one half year ago, that was really a thing. One year ago that we started doing that. Whereas we were first just processing everything in batch and every night we were picking up all data from last day and going from basically the most raw form to the, I think you guys called it the golden stage in some context, but basically the end results that you can serve to a user. And it was all done overnight, taking several hours, if nothing went wrong done at the end of the morning. It had a lot of time constraints actually. Once we had basically the opportunity together with the new Delta service that was launched and basically the streaming framework of Spark getting more mature we basically started to do this, to basically switch this batch job from the most raw data we have to where it’s a bit more refined and processed data.
We transformed that into a streaming job actually and that helped us so much because now we don’t have to wait for ages anymore at the start of the day to process that whole batch of the last day. And we can just basically start already at a way more mature level of the data, say early in the morning, to just do some more and more and more refinement to get to some end results. Those kinds of processing after the first processing kind of jobs we still do in batch, because that’s still quite logical because most things we do as a use case as a feature for the user, makes sense in the context of a day, but just processing that huge bulk of raw data during the whole day, just when it comes in that’s has given us so much speed and flexibility, so to say. That’s really nice.
I know that you work on the machine learning side of things. How has using Delta Lake helped either accelerate your processes, machine learning engineer just made this process a whole lot simpler using Delta?
Yeah. I think a nice example that has been pretty recent for us is that we actually we’re super flexible in using particular data, both in batch or in stream. And then we didn’t have to bother about it anymore, how to do to that. That was just basically out of the box using Delta. Where we always basically were reading some data in batch and then doing some basically putting it in a machine learning model to do a prediction and basically the outcome of that model was a use case for the user. We now saw opportunity to basically reuse that whole, basically data ingested into the model and given out to the user in a streaming fashion.
We saw an opportunity there to basically reuse that mold, reuse that input data, but now from batch to streaming and in that way, giving the user a, I’d say, almost near real time experience whereas we before had to do that a day late, so to say. And yeah, I think that’s a good example how, because we’re still using as well, the batch job on top of the data next to the streaming job and that is quite flexible and really nice to basically not have to bother with two options, but just have them all at the same time.
What type of frameworks are you using for machine learning? Are you using scikit-learn? Spark ML? Which type of libraries you’re using? And I’m sure people would also be interested what type of algorithms or models you’re building too.
Yeah, sure. Basically the models and frames we’re using it actually quite depends on, we’re basically using some different libraries all over the code base, but mostly scikit-learn, Spark ML. but we have basically several models, both some online and both some offline trains that are used daily to make predictions. What we actually try is the online trained models, they are actually oftentimes done using Spark ML because Spark ML also integrates nicely with Spark Scala and most of our pipelines are actually built in Spark Scala because we like the tightness you get with that one. And it’s just also a bit more robust towards testing, but then when you want to do a model in between, when you physically read some data, process some data and then do a model, it’s kind of nice to stay in the same context. To stay within Spark Scala.
So there, we often use some Spark ML models to get the job done, but sometimes especially the more heavy data scientists at Quby, they also want through use a bit more of the more robust and I would say not robust but let’s say Spark ML has several options, but it doesn’t have them all right. Sometimes for clustering, it’s not the best. Implementations there are often custom and done by open source community, sometimes not up to date for instance with Spark Treatment O or these kind of things. Then we sometimes move if it’s possible in terms of the data size to scikit-learn and leverage some of the algorithms that are out there so we’re not dependent on Spark versions or basically third parties or open source communities to keep our algorithms up to date.
Yeah, that’s a very common concern that I’ve had many customers raise of, hey, they love this algorithm in scikit-learn, but oops, Spark ML doesn’t have it. There is a third party package, but they don’t want to leverage some third party package that isn’t fully supported. My question to you is for those use cases where you build a model using scikit-learn, do you leverage Spark either for distributed training or distributed hyperparameter tuning of those models?
Yeah, good question. It is something we’re also, like you say, looking out for ourselves, how to make that combination and what we’re actually doing in some place in our code base is we have maybe a scikit-learn learn model that we trained offline with a particular sample of our data and then it’s just sitting there as an artifact that we use daily to make predictions, but then we wrap it’s basically in a Pandas UDF and then we can use that one in a distributed fashion, to predicts on on all our customers, on all our users on a daily basis. That’s something we do and I guess, which makes at least for prediction, a nice combination of using Spark and scikit-learn together.
Perfect. Well, so in that case, what are some of the new cases that you’re looking on the horizon right now?
You mean use cases and features that we’re looking into for the user? Also an interesting one, something I’ve been working with a lot myself as well with Quby and our customers, as in not the end end customers, but more the companies we cooperate with in order to get a products to the end customer are oftentimes energy providers. Basically the companies that you go to to get a gas, electricity contract. Those companies often work together with us in order to get our smart thermostat out to the users. But also we do lots of things together with them in the context of smart meters. Is that actually a thing in the States, smart meters? I think so. Basically these are just meters at the people’s houses that automatically collect the gas electricity data. That’s also a new source that we’ve been leveraging for about two years now to do data different services on top of.
Long introduction to something that those companies are especially interested in and therefore we as well is how to basically, what’s next in the upcoming decade, so to say? Because gas electricity contracts are still a big thing now, but in the context of climate change and upgrading your houses to be ready for the future, there’s a lot of interesting topics like solar panels, heat pumps, insulation, think of these, think about these kinds of things. Our next features are all a bit in the context of how can we help our end user that we’re already connected to via our smart thermostat or via our services in apps and these kinds of things, how can we engage them through also get ready for that next step? How can we help them to identify nice opportunities like investments, if you may say so? Nothing in a sense that we’re really just upselling or marketing, but more like helping the user to identify that this is really something beneficial for them as well. That’s a whole new area where we’re building up some equities right now.
I recall from one of our previous conversations, one of the most interesting use cases to me that Quby is working on is just about monitoring. A lot of people don’t want to have cameras in their home, looking at them or having their grandchildren set up cameras to monitor what they’re doing. And instead Quby is leveraging less intrusive technology like, hey, was there a toilet flush today? If there was, that’s my sign that grandma’s alive. Can you talk a little bit more about this less intrusive monitoring that Quby’s working on?
Yeah. Yeah. Yeah, an example you give is of course, a bit of a sarcastic one. This is something we’re looking into, but is truly very hard, like can you see that my grandma’s still alive? That my grandma’s still doing fine? That’s something that’s we are looking into, but that is really an area that is, I would say it’s so hard, it’s slowly progressing. But in the meantime, we of course tried to build up, well, some expertise on how to recognize and in a non-intrusive way that somebody is at home or not. That some things in people’s houses are happening because it’s intended or happening because it’s just a mistake or just something that was forgotten. And with that I mean, oftentimes people are just leaving on the heating at night, but that’s just because they forgot to put it off or they leave it on but that’s just because they forgot to put it off all day, went out to do groceries, or I’m just giving some examples, but lots of things are happening in the house that the people in the house are also not aware of.
Or maybe the solar panels, they used to work fine, but somehow something went wrong and they stopped to redelivered to the net. That’s something you want to be notified about, but you don’t want somebody filming your house the whole day or anything like. This is something that we basically try to make alerting kind of services for so that the user is informed at the right moment to take action without having the feeling that he’s creeped upon, is being watched over like big brother the whole time. And that is a fine balance that we also need to figure out together with our UX department, how to do that using the data, but also using the experience of the user in a good way to get the most out of it.
Well, those are naturally segues, what’s the customer sentiment around this type of monitoring? You have these and governance concerns that you need to be attentive to. What are some of the concerns, especially given that you’re a European company, what are some of those concerns that you actually have to address now?
Yeah. Yeah. Good question, Denny. I think this is something that lots of people ask us all the time, but in Europe, GDPR compliance is a big thing. And actually, when that came about, it was just necessary to do transform everything we were doing, make the updates to be compliant. But to be honest, that only gave us good stuff because we were separating things that should not be together. We were designing the infrastructure in such a way that we had more control on where the data was sitting and where it was not sitting and how to delete stuff for instance. And it just made our platform more robust in a sense. Unless of course the user doesn’t really notice that the end user, but when you think about basically this kind of monitoring and what it means for the end user, we always try to think of three things. Of course we have our privacy, I would say, our privacy documents, the people’s sign it when they get one if our products, so to say.
Yeah, exactly. Our privacy contracts, and they are generic enough to do particular things, but also specific enough to do not just completely start doing something else and sell their data to third parties that they never knew of. But this is something that’s just law. But also when you think of it as an engineer, as a designer, I think you should always think about a couple of things, namely, okay if I were an end user of this product, would I find it useful and normal that this company now starts this new feature and does this thing with my data? If the answer to that is yes, then it’s probably a good idea to create that feature. And it’s probably also legal, but of course you check that, but it’s at least it seems like a good idea. Then there’s basically a gray area where you’re like, hmm, would the user like that? I’m not exactly sure.
Then the answer to the question, would the user like it is, ask him, ask permission. Don’t do anything before you have to permission of the customer through start doing that feature for that particular customer. And the last, most extreme red zone, if you can call it that if you have already a creepy feeling yourself by a particular feature or service, if it can do something else, that’s just not a great idea. And of course, again, what I’m saying now, that’s not exactly legal stuff, but it’s just the way I think you should always start before you even start checking the law and even start doing something.
Well, that’s great that GDPR had such a positive impact on how you decoupled your services and how it made your whole system much more robust. How do you manage the machine learning models? I know you’ve mentioned that you use Spark for your data processing and Delta for storing your data. How do you manage all the different machine learning models that you’re building across all these different use cases? Some are in Scala with Spark ML, somewhere in Python with scikit-learn et cetera.
Yeah. That’s a interesting question, Brooke, because that’s also something that, well, me personally, I took a recent interest on last couple of months, but also something of course we started taking care of from the beginning that we were basically deploying more and more models, but to be exactly honest, the story I told you already at the beginning, how it started go out for Quby with the first data driven service and then getting the Databricks platform in place to do these kind of things. This was not there directly from the start. The monitoring of these models was not there from start, but was something we’ve added later in retrospection seeing that it was really something that was necessary. And that we’re still working on right now, because I think you’re never done in that sense, you can always learn because you always monitor what you think can go wrong, but sometimes something goes wrong that you didn’t expect and then you probably want to add some more monitoring and next time you can also catch that X case, I would say and how are we doing it?
It’s also funny, because next Data and AI Summit in Europe, a colleague of mine and also an ex-colleague of mine, who’s now actually at Databricks. I’m talking about Shekh and Aemro, they’re going to give a talk about exactly this topic. To all the people listening, tune in. This will be a top advertisement for my colleague’s talk, but they’re going to talk about it more in depth. But I think what we really try to do is keep an eye on all data coming into these models and coming out of these models. Basically we monitor the data quantity and quality levels all throughout our pipelines. See if everything is as expected, but we’ve also started doing recently is adding more monitoring on top of the models itself.
Basically every day, especially for the daily trained models, which are just trained on the data of the previous day in order to make prediction or a classification or a clustering for that particular day, those models we started to basically start logging the metrics more structurally and putting some alerting on top of them. In case that they change for a considerable amount, that we are notified and our attention is drawn to this particular model that something is not as expected. And then of course, the question is, what should you do then? And that’s also something we are learning now. Sometimes it’s just like, okay, that’s logical because the error, our heating model went up because the heating season started. That is kind of logical, but sometimes something happens that was not expected and then maybe it’s good that you get alerted instead of the pipeline passing and succeeding but the quality’s not really there.
I have a question for you about alerting monitoring in a second, but I do have to clarify, since we hired from a Databricks customer, Aemro who’s presenting that talk at Data and AI Summit, he was relocating to the States. And so Quby had said he couldn’t leave, we’re not allowed to hire customers.
Also from my side, that was, we are very happy for Aemro. The whole data team, whole Quby. He’s was indeed moving towards the States and that’s why he now got the employed by you guys. I think it’s really nice for him. If you’re listening, Aemro, congrats again.
All right. Now my question for alerting monitoring for you, what type of metrics are you keeping track of with your models and also what types of learning tools are using? Are people getting paged if you detect model drift? What tools do you have in place? Does it automatically regress to an earlier version of the model?
Yeah, so to be totally honest, some of the things that I’m going to say now are still experimental and not really at the end stage how we envision them at Quby yet. But what we actually recently started doing, is we kind of started abusing MLflow a bit to whereas you normally use MLflow while training your models and check how they compare. And then basically you know which one performed best over all your metrics and that’s the one that you deploy to production. But actually what we started doing is all our daily trained models, we also started logging them every day for that particular sample of data for that day. We started logging a couple of metrics. You should think about just absolute error, but also so metrics that say something about the distribution of the input data, some averages of some particular features and these kind of things. And that daily log that we actually observed over time to see how these metrics basically look like as a time series.
And then if the time series of the metrics moves in an unexpected direction, that’s when we will put a alert on top of it. For now, we just do that in notebook. We just read the data from the MLflow log, we check for a particular increase and decreases, and then we get an alert in Slack. It’s maybe still a bit in a proof of concept, but it’s something that is working for us.
Already helped us to basically upgrade a particular model that we were not completely happy with and do that in a very informed way so we could actually check out how the old mode was performing over a long period in history, how the new model would be performing if it would have been deployed in history, but also tracking for considerable amount of time in the future, as the future started going, we were tracking how the old model would have done and the new model that was already in production was doing so we were keeping track with that in that way. And I still, maybe abuse not the right word, but I still feel we abused MLflow a little bit through do that, but it really got us what we needed. And I think it was just a creative way to get the job done.
No, I think it actually makes sense. And by the way, I’m a huge fan of actually sending alerts through Slack versus flooding my email. In the past, I’ve seen email threads galore and you’re going, oh, great. I’m not going to be able to tell what from what, which actually segues to my next question. How did you do all this alerting tracking prior to MLflow? How did you do all this in that case?
Not. Not, yeah, that’s true. I think what always happened before we started working with MLflow, which we actually started working with quite soon as it wasn’t locked in, as soon as it was locked in the Databricks environment. We were oftentimes just training either an offline model, looking if it was doing well for the samples we had at our disposal. And then as an artifact, we were just serving that. And once after a couple of months, we were having a feeling that it might not update, we would do the update, so to say. Similarly with the online models, which we didn’t have too many around that time, but still, also those were just tried for the data we had at hand, when we were building them. They were also tried for the historical data we had, but once they were in production, the kind of metrics I was just talking about, they were not tracked anymore in such a robust way.
In that sense, MLflow helped us a lot through get that going, but also, especially also for the offline trained models to get a better administration, so to say. What was done in the past and how things were working right now and what was the next step to take. Even apart from the alerting that I was talking about earlier. That has helped us a lot too, with the team keep track of stuff because sometimes things get lost in notebooks in the past and that is not really happening anymore.
Well, that’s great that MLflow has helped you track your models, your parameters, your metrics, et cetera. I know I’ve worked with a lot of customers and when I ask them, “All right, how do you currently keep track of how well your models are performing?” They’re like, “Yeah, we have a spreadsheet, either a Google Sheet or an Excel sheet.” And it’ll be like, underscore final, final LRV3. And then going back to find where that was or sometimes they write it down on a piece of paper next to their desk. They go home and crap, what was that model’s performance? And so there’s just a lot of reinventing the wheel. I really liked how MLflow standardizes everything so you can share experiments back and forth. What is one of the coolest things that you’ve been able to do now that you can share your model experiments with other colleagues? Have you found that it reduces duplicate efforts? Have you found that it reduces your time to compare models? What has been some of the performance benefits of using MLflow?
Yeah. What I really, how do you say? Notice myself in project is that it’s just way easier to compare if you say are bare programming or even working together with a couple of scientists in a particular project where you need to get a new model out. What we sometimes do is we just have a bit of competition going, seeing who can get the best model, but then you can iterate really quickly, but still in a fair way compare apples to apples instead of comparing apples to pears, so to say. Is an English thing to say? Isn’t that same, but you want to compare in a fair way. You want to compare on the same metrics that are calculated in the same way and basically the same data and you basically want the constraints to be the same for everyone. And I think MLflow is keeping track of that nicely so that we can actually cooperate and iterate on each other’s models but also compare them fairly and then going from a basically a quick iteration of ideas towards a good model very quickly.
Yeah. In the States we say apples to oranges, but that might just be because California and Florida grows so many oranges.
Oh yeah, sure. Oops.
Well, I wanted to thank you Ellissa for us today to talk about how you combine MLflow with your Lake houses, to help customers save energy and to help the environment as well. We really appreciate you taking all the time that you did with us today to chat about your experience as a machine learning engineer. Thank you for joining us on Data Brew.
Thank you for having me. It was a pleasure.