Deploying and managing machine learning models at scale introduces new complexities. Fortunately, there are tools that simplify this process. In this talk we walk you through an end-to-end hands on example showing how you can go from research to production without much complexity by leveraging the Seldon Core and MLflow frameworks. We will train a set of ML models, and we will showcase a simple way to deploy them to a Kubernetes cluster through sophisticated deployment methods, including canary deployments, shadow deployments and we’ll touch upon richer ML graphs such as explainer deployments.
Speaker: Adrián González Martín
– Hello everyone. Thanks for joining my session on Seamless MLOps with Seldon and Mlflow. You may be wondering who am i so first introduction So my name is Adrian. I’m a Machine Learning Engineer at Seldon. I joined Seldon like a year ago and before that I took a masters in machine learning and before that i spent a few years working as a software engineer. So I have that kind of mixed background between engineering and machine learning which is what really makes me very interested in this kind of engineering questions around machine learning and machine learning systems. Next you may be wondering about what’s Seldon. So Seldon is a company a very small startup that focuses on machine learning deployment. So essentially how to take your models from the training state to the production state. And we are right now around 22 people we are very passionate about open source. We, in fact like our main open source product is our main product is open source is we have a few other libraries and we also collaborate with some other big open source projects like Dave serving which is part of the Q Flow umbrella. Also Seamless pack here we are hiring so feel free to reach me if you have any questions about that or otherwise check that URL Happy to help you with that. Cool so what are we gonna see today? So we are gonna start first by talking about MLOps and why talking about why MLOps is a hard problem essentially. And I’m sure that you must have heard already a lot about MLOps in this conference. So sorry for talking about it again but i think it’s just really important to set the right problem to figure out what problem we want to solve before jumping into possible solutions. So essentially we’re gonna go around that. And we’re gonna see afterwards how we can solve some of the parts of the problems with MLOps full with combining laborating MLflow and Seldon. And afterwards we’ll just do a quick demo showing some of the power of podh and how good they complement each other. So first of all, why is MLOps hard? So if we think about any kind of machine learning project, usually the first thing that comes to your mind is well let’s grab a data set and let’s try to train a model with that data set. And for that just spin out your Jupyter instance and you go and you try to train that model in your notebook. However, you soon realize that notebooks don’t scale by themselves. So it’s very common to end with this kind of pattern when you have a bunch of notebooks. It’s very hard to know what’s on each other. It’s very hard to test. It’s very hard to version control to code review et cetera et cetera. Just to make clear, just to clarify I don’t have anything against notebooks. Notebooks are great. But they are a tool that has been really really overused and tried to use for things that it wasn’t designed for. We need to think of notebooks as what it is. It’s just scripting for machine learning. You put in, if we take the analogy to classic web apps for example, you wouldn’t take a script and just put it into production. You want to think about testing. You want to think about CI/CD. You want to think about containerization et cetera et cetera. And in fact we need to keep in mind that usually training is not the end goal. We want to somehow expose our model we want to make our model usable by people. And it could be people, could be third party services, it could be older data science team within the company, could be other things. But training is just a part of the machine learning life cycle. So we have to think also about data processing. So how this process usually looks like and this is just a very high level overview. You usually have data processing where you would do doing any kind of data cleaning, any kind of feature processing et cetera to then go into training where you would do the actual training of your model an actual experiment tracking et cetera and then you would expose that you would deploy that to serve your model. And this is, it’s one of these steps. It’s actually pretty massive. So if we think about how notebooks work it just can’t fit all of this entire life cycle. So we usually need to think, so if we look at data processing for example we need to think about keeping track of the data lineage keeping track of managing the different workflows that we need to clean our data. Labeling our data, if you think about training, we need to think about how to distribute the training how to do experiment tracking how to et cetera et cetera et cetera. If we think about serving, we think about packaging our model and then all the other day two concerns like monitoring our model, audit trails et cetera. So where does MLOps fit in here in this entire lifecycle? Well, if we take I just took this definition out from the wikipedia. I know that you shouldn’t take, you shouldn’t quote wikipedia but I just really liked the definition I was looking for definition for MLOps and this one I think just nails the right place, is great. “MLOps compound of machine learning and operations is a practice for collaboration and communication between data scientists and operations professionals to help manage production ML lifecycle.” So from here we can see what thing, what kind of things MLOps tries to solve and we can also infer what things make MLOps hard. So if we think about the different challenges that we find in the machine learning life cycle, we have on one hand we have for example a wide range of heterogeneous very heterogeneous requirements. So for example if we focus on the training part, we would usually have different data science teams in our organization. It’s one of them maybe using its own framework maybe because they prefer it and they have expertise in that framework or maybe just because they just use it for a specialized purpose which makes sense for that framework. So and you don’t want to constrain which frameworks they want to use so for example one of them may be using TensorFlow the other one may be using piecharts the other one may be using XGBoost. So here you really have an axis of variability that you need to think about when you kind of design this ML life cycle. You also need to think about the different infrastructure requirements that you may have across each one of these stages because they are very different between them so for example, if you focus on training, the training state the infrastructure required by the training state may for example mean a massive range of GPUs. Because you want to distribute your training across all of them but then, survey may not need all of that power and you don’t want to pay for that powering in serving. So you need to think about this kind of concerns. Some of the steps in the ML lifecycle are also technically very challenging. So if you think for example about monitoring your model, this is just this is not measuring CPU this is not necessary memory this is maybe detecting that a data point at inference time is an outlier or that the data set coming at inference time is drifting away from your training set. These are technically challenging problems by themselves. You also need to think that it’s one of the models that you train. Need to go through this entire life cycle. So you need to think about how to scale that up and then last but not least I think this is the most important point it’s organizational challenge that MLOps gives you. So if we think for example of DevOps, DevOps was meant to solve the barrier between the engineering teams and the system administrators. There was a wall in between and DevOps was trying to break that wall. In the end I guess it was kind of a mixed result but let’s not get in that. If we think then about MLOps, we need to bring to the set of walls we need to add another one with the data science team. And we need to add another one probably with the data engineering team. We need to think about how to overcome each one of these barriers. DevOps, for example, try to solve this by automating as many as much processes as possible so that you would give the power to for example engineers to own, they own the particular infrastructure. And we’re gonna see how we can reach to a similar level so that data scientists can own their model in production as well. And now, how are we gonna do that? So, in this session what we’re gonna see is how MLFlow for training and Seldon Core for serving kind of fit that purpose. And we’re gonna start with MLFlow. So first of all, what is MLFlow? So you are in this conference so I imagine that you really know quite a lot about MLFlow but just to kind of get our contacts together so MLFlow is basically an open source project. It was initially started by Databrix and is now part of the LFAI. They dedanted the project to the LFAI I think beginning of this year. I think by the way the LFAI is probably not called LFAI anymore. I think now is the LFA and Data, but let’s not get into that. So essentially, “MLFlow is an open source platform to manage the ML life cycle including experimentation, reproducibility, deployment and a central model registry.” So this is a quote coming from their their website. How does this look like in practice. So essentially, MLFlow manages different concepts that lets you model this life cycle at least on the training state. So first of all, you would have an email for a project MLFlow project is kind of the superset of what you’re trying to achieve with a particular model. So it would be comprised of different experiment iterations et cetera. So you would have your MLFlow project and with your MLFlow project you would run experiments within your MLFlow project. These experiments and these results are tracked into a MLFlow tracking server which is essentially responsible of holding all the results of our experiments and all the different hyperparameters that we set on each iteration. You can also lock the output of these experiments as a trained model. So this would for example mean the different iterations of your model occurs over time. Now these models can, so in the demo that we’ll see later, we are gonna serialize these models to our local file store. However, in our production setting Jupyter serialize this into a cloud storage bucket somewhere. So this could be something like S3, it could be something like Mini, local Mini cluster in your infrastructure. It could be there DataBricks DBFS if you wanted to keep it all in house with DataBricks or it could be Google Cloud Storage et cetera et cetera. Think and also has to prefer a wide range of storage providers. On top of this, you also have the MLFlow model registry. So the MLFlow model registry would essentially allow you to keep track of each one of these iterations of its model as well as managing some metadata that comes alongside its model. Now, what is this metadata? what are we talking about here? So metadata here could be who trained this model, when did they train it but also it could be more abstract things like a which state is this model? Is this model consider a testing experiment, is it a stable experiment, has it been approved by someone? As well as any kind of arbitrary metadata. So to recap on this, we have these four components. From here probably the most important and the one that we’re gonna focus on the demo to keep things simple, are the Project and the Model. So the Project is where again, to re-emphasize, you would define your environment i.e your dependencies, your versions, your set of parameters which parameters you’ve got in your model and how do you how can MLFlow train and interact with that model. And then the MLFlow model would be the snapshot the serialized version of a particular experiment iteration. How do these actually look like? So on one hand the MLFlow project is encoded into an MLproject file. This MLproject file has a name, for example here would be mlflow-talk, and also has a pointer to an environment to a description of an environment. So in here we are describing our environment using Conda. And we’ve got a Conda yaml file which essentially has a list of all of our dependencies with the version linked to each of them. It also defines what are the parameters of your model. So for example here we would have two parameters, alpha and l1_ratio. And as you could imagine, these parameters correspond to usually to actual hyperparameters of your model. So these are the things that do tweak on each iteration to, on its experiment iteration to train to optimize your model. Lastly, it has the command or how you can interact with any arbitrary training script. MLFlow, this is an important point, MLFlow tries to not get in the way with the training process. So let’s say you’ve got something super complicated it tries not to get in the way instead it’s just, sorry, instead it just lets you define how it needs where it needs to plug those parameters in order to train a model and then the script itself is the one that will be capturing the output of that model. Now how we can run this, it’s as simple as if you’ve got the MLFlow package installed it’s as simple as doing like a MLFlow run pointing to where your MLproject is setting the right hyperparameter values that you want to set and that’s it. On the other hand, the output of this training is an MLmodel. So this is a snapshot of your model. This is something that MLFlow does underneath when you lock a model. It’s gonna create an MLmodel file which essentially has flavors which is a concept of MLflow and it’s essentially a way is different ways of using and importing back your models. This is particularly relevant for deserving states because it’s gonna define how you’re gonna use your model afterwards, your train model. So for example Seldon Core only supports the python function flavor right now. Which means that your model needs to be serialized. It needs to be able to get serialized into python function which is a more generic flavor in MLFlow. Other things that we can see here are we can see that the model was serialized, is in pickle in a file called model.pickle which is alongside the our MLmodel file and you can also see the loader function, the function that we need to use in order to load it back. We can also have, for each one of these models a run id which is essentially an id that links back to our experiment. Now, we’re not gonna show this in the demo but for example these could be used to link back a model running in production to where it came from. This is a really powerful thing. We can also see here the MLFlow UI which is essentially this MLFlow tracking interface which is where we’re gonna be able to see the output of our training. So for example, here we can see two experiment iterations one of them had two settings of each of the hyper parameters to 0.1 0.1 and a set of metrics and the other one has just different values and different which results in different metrics. Just to re-emphasize and to recap each one of these experimentation iterations will have attached snapshot of a model. So each one of these we’ll have a model and we’re gonna see later how in Seldon we can actually take this model and just deploy it to try it out or to compare them between them or however we want. Cool.So talking about Seldon let’s just jump into that now. So first of all, you may be wondering about what Seldon is or where it fits and in here I just like this view. This is basically and linking back to the MLOps problem is another view into why MLOps is hard. So essentially, here on the right hand side, on the left hand side sorry, you would have all of the data problems the data processing problems configuration data collection et cetera. That’s what would happen before in the ML lifecycle. You didn’t have that black box in the middle which would be the ML code that would be your Jupyter notebook essentially or some other kind of notebook but essentially the actual training code and then you have everything else that comes on the right hand side of the green boxes which are essentially what happens when you want to serve to expose that model. Now this figure comes from a very famous paper by Google called Hidden Technical Depth in Machine Learning Systems and it highlights the same MLOps problem basically in a different way Now Seldon fits in that right hand side. Those are the kinds of problems that Seldon tries to to solve. And we’re gonna see later how. But first of all to keep talking about what is actually Seldom Core to getting a quote from the repo, “An MLOps framework, is an MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.” Now, what does this actually mean or if we want to see this more graphically so essentially Seldon Core allows you to go very easily from a set of model artifacts which could be your MLFlow models your serialized models in the case of MLFlow to an API that can be consumed to run inference on that model. And it tries to make this process as simple as possible. Now this process could be something as simple as just exposing a model which is usually the first use case that you want to solve. But it could also grow more and more complex. So you wanted to add multiple models to your to the same inference graph and maybe have some kind of a smart router between them or maybe you are interested in having some kind of pre-processing so that you can transform the endpoint the data coming directly to the public endpoint in some way maybe this makes sense in NLP settings for example. Or you can even grow more complex so let’s say for some reason now you’ve got a so maybe you still have an MLFlow model but you’ve got any kind of custom requirement in your organization. That means that you can’t use the like the out-of-the-box inference server that comes with Seldon for MLFlow. You’re still allowed define your own set of Docker images that run this inference server. So you could extend the existing one to kind of kind of add any custom logic. All of this information is captured into what’s called a custom resource definition. So essentially a custom resource definition is Kubernetes terminology to define an abstraction that will let you manage, will let you encode an architectural pattern that you know about into a single resource in Kubernetes. For example here, the Seldon deployment CRD encodes the architectural patterns required to deploy machine learning models in Kubernetes into a single resource that just holds your model configuration. And we’re gonna see later how this model configuration looks like. But essentially encode it on here you could also have any kind of advanced monitoring logging et cetera. Something else that also comes out of the box with Seldon something that we just mentioned briefly are a set of infrence servers that will allow you to run to deploy model string in common in the most common ML frameworks. Also worth mentioning, although i just mentioned it slightly Seldon Core makes heavy use of the Kubernetes APIs. So it’s a cloud native solution that only runs on top of Kubernetes. The plus side of this is that it allows it to run on all major cloud providers basically because you can have Kubernetes clustering it’s one of them or even on on-prem solutions like using something like OpenShift which is like an easier to use Kubernetes distribution not easier to use, more robust enterprise ready maybe. Now, how does this work? How does this CRD actually look like? So we’ve got here an example of a instance of the Seldon deployment custom resource which is what we’ve got on the left hand side. So we can see that it has a name example model and we can see that it defines we defines the predictors, it defines a single predictor which has a graph object a graph field. Inside this graph field, we’ve got the definition of our inference graph. So we can see that it has first a transformer node which is gonna be a node that’s gonna transform the input that comes into the inference graph from the outside. We’ve got a combiner node that is gonna gather the output of two models which are gonna receive this transform input and in particular if we focus it in these models, we can see that one of them, the one called classifier actually has an implementation field that says it’s an MLFlow model. It’s it’s gonna use the MLFlow inference server. This is one of the inference servers that comes out of the box with Seldon. And it’s essentially gonna allow you to just point to a set of weights stored in the cloud to expose your model from there without any kind of further configuration. So for example here, you can see that the model URI field just points to a URI in Google Cloud Storage. However, it still gives you power to control is one of these inference, is one of these inference nodes and in fact if you look at the field just above, component specs field, on that one we override the image, the Docker image using in some on the other three inference nodes. So we are able to say, this node is just gonna run this Docker image. So I don’t care about any pre-packet inference server I just want to run this Docker image and that’s fine. And in fact this also lets you overwrite any kind of thing that you can change in a pod spec. Now, what happens under the hood when you apply this model? So when you run like if cats will apply of this deployment.yaml file what would happen is, Seldon would create a bunch of resources for you probably the most important, the one we want to focus about here is the Pod and the Pod will just have a set of containers. Some of these containers are gonna be are gonna map one-to-one to the inference nodes that you had in your deployment yaml, in your Seldon deployment custom resource. For example, we consider that we’ve got the input transformer node, the my model node, the classifier node, the model combiner node. And just to re-emphasize, the classifier node for example, here would just run, it’s just gonna be running the prepackaged inference server for MLFlow. Other containers that we have in this Pod which are injected by Seldon are on one hand in a container. The ini container is a sidecar container that is gonna be responsible for downloading the model. So for example in this case, we have a single node the node classifier, which is just pointing to a model URI in the cloud in a container that is responsible of fetching those weights making them available to the model container. And it does this before everything else happens. Now the second side sidecar container that we can see is the orchestrator. The orchestrator is essentially responsible for receiving all input requests and moving them along the inference graph as it sees fit. So for example here, it would receive that request, send it first to the input transformer to transform the input it would then send it to my model and classifier would receive the output of those and then it would just send those down to the model combiner and get the output and back to the user. Now, focusing on these inference servers these ones essentially allow you to very quickly in a very streamlined way deploy models coming from different runtimes from different machine learning toolkits. You can see them as runtimes. So for example here if we would have our Kubernetes cluster our Kubernetes cluster we would have a Seldon deployment custom resource deployed called model-a. Now model-a it specifies that it’s gonna to use the MLFlow inference server the prepackaged seamless flow infrence server. It’s just gonna point to a URL in the cloud to fetch those model-a weight which is gonna be again the snapshot that came from our MLmodel our MLFlow model. However, you could also have a cellular deployment custom resource called model- b that just points to the XGBoost pre-packaged inference server which means that you can just point your weight to a pro bst file ie the snapshot, the output of training against the boost model. And this links back to one of the problems that we mentioned before so one of the problems with MLOps is the heterogeneous requirements that we can see across teams even within the same organization. We’ve got a MLFlow that solves for those at training time. So with MLFlow, you are able to kind of give the user a unified training layer so that they can train and serialize models coming from any framework. Maybe not from any framework, but from most frameworks. And then with Seldon Core you’ve got a unified deployment layer that lets users deploy models coming again from any model. You’re also not restricted to the subset of inference servers. You can also define your custom one so for example, you could just grab you could very easily by just extending all the interfaces in the Seldom Core packets you could very roll up a python inference server very quickly maybe for some kind of custom requirement maybe for a framework that we don’t support. However, Seldon also gives you out of the box a lot of pre-built things and opinionated solutions for all let’s call them day two concerns. So first you usually are concerned about getting your model into a server that you can query. But then, the next step is what happens once you have that server, that model deployed. You need to think about monitoring, you need to think about login, you need to think about a set of things that Seldon comes out of the box with integrations for. So for example for monitoring, out of the box if we log a set of metrics usually more DevOps oriented metrics, that can be described by Prometheus. And I’m not gonna spend more time here because we will see some of this later in the demo. Sometimes though, DevOps metrics like memory CPU are not enough. This is something that we’re mentioning about when you’re monitoring a machine learning model you need to think about more advanced metrics let’s say. And for those we leverage Knative to be along a synchronous pipeline that allows us to compute some things on each one of the data points that comes into our model. The reason for leveraging Knative is that we can build an asynchronous pipeline very easily that won’t affect the latency of our inference process. Because usually these are heavy things usually it’s one of these it’s a machine learning problem on its own. For example if we look at outlier detection, we learn we use one of our other open source libraries called Alibi detect that essentially implements for you a bunch of algorithms for outlier detection and drift detection and any kind of monitoring problem that you’ve got in your production machine learning system. We also leverage Seldon detect to build a drift detection pipeline and synchronous pipeline running Knative under the hood that is gonna run drip detection for you or any any set of custom metrics which could involve something relevant to your use case or could be things like an accuracy, things like that if you have access to ground truth. These can go back to Prometheus. So they can be shown to Grafana if you want to or they can be or you can set some kind of alert manager alerts. It’s built into Prometheus to alert you whenever something wrong happens. Other things that Seldon deals with are adjustability. So in a similar way, following a similar architectural pattern as we saw before with Knative, Seldon core also allows you to log any input payload and any predicted value by your model to elasticsearch. This allows you to keep track of all the things that your model is predicting so that you can go back and then and see what it did which is important particularly important in some industries. Other data concerns are explainability. We have another library another open source library called Alibi explain that essentially deals with potentially deals with, it pre-implements for you about a set of explainer algorithms that you can run in for on your model to kind of explain predictions. So you can imagine now how everything links together so you can on this single custom resource you can define a set of things like for example you can link your model with an explainer and explain your type so that any kind of input payload that comes into your model can be explained very easily. And all of this is self-contained in that abstraction in that custom resource abstraction. Other things that we can have in Seldon core are advanced deployment models which are particularly relevant for machine learning. You’re gonna see more in the demo about why is this relevant. So I’m just, I’m gonna spend much more time on this but essentially you’re able to run our A/B test which we will do later or set the deployments or other more advanced deployment models. And with that, i think we are ready to go into the demo. However before that, and just to kind of set the stage right I just want to describe a bit more what we’re gonna see. So, what we’re gonna do in the demo is we’re gonna show all of these different pieces link together between them. And for that we’re gonna think of a use case for example use case, in particular we’re gonna want to build we’re gonna build a wine ecommerce, we can think of a wine e-commerce a website that just sells wine. And as part of that, we want to provide a score for each one of the wines that we sell. So for that we’re gonna train a model that predicts this wine quality for new wines. However, we want to also listen to feedback from the customers to see how well it’s one of the models matches to the tastes of our customers. Which is where things get tricky. So we’re gonna need to implement some kind of feedback loop. But first things first, you would usually go try to find a data set we’ve got here a data set a one quantity data set that just describes a set of features but your wine, I don’t expect in wine so i’m just gonna assume that this makes sense and it has a quality at the end at that row. It’s one of the rows of the podtle of the wines. We will use an elasticnet model to train these predictor this data set, to predict the score the quality of each one of the wines. Now, elasticnet is very simple. It’s just linear regression with the addition of two regularizers, an L1 and L2 regularizers and a couple of hyperparameters a and b they are just associated with each one of the regularizers. So as you can imagine, we are gonna want to tweak these hyperparameters to find what is the best setting of our model. And these are, it’s important to remember that these are hyperparameters. So during training, we’re gonna find the set of beta weights beta coefficients but we will need to think of how to tweak a and b to find the right performance. So as you can imagine how you’re gonna to do this with MLFlow at least for the training side we’re gonna have a MLFlow project like wine project we’re gonna run experiments, we’re gonna keep track of those results in a yaml flow tracking server and for each one of these iterations we’re gonna lock the train model. Which will usually have a different set of hyperparameters Now, each one of these models will be serialized into Google Cloud Storage packet. Now, as I mentioned earlier for the demo this is just gonna be a manual process. You still in production you would have MLFlow directly configured to post models to Google Cloud Storage so that you don’t need to have that extra thing and you would even have usually the model register on top to kind of trigger deployments very easily. Now we’re gonna have different iterations of our models. So one of the problems is gonna be comparing comparing them against them to see which one is best. You could argue well we’ve got a training data set that’s just built for that so let’s just look at that but maybe, just maybe when you run into when you deploy these models into production the performance reported by your training doesn’t match with what you see in production. These are, this can happen for example if well, I’m not sure about wine so here it’s a big disclaimer. I’m not an expert in wine but I can imagine other things for example in clothes in fashion, tastes change. What a customer likes change over time there are trends there are things that you don’t think about sometimes or even like new generations of people that like different things. So it’s very important to compare these models in production it’s hard to compare which version of the model is on a workbench. You need to put them in production. And for that what we’re gonna do is we’re gonna use this infra this feedback loop to kind of read what the customer thinks about a particular wine and we’re gonna do an A/B test between these two between two versions of the model to see which one is better. So, thinking about that and how we’re gonna put that in production the serving step would be defining a Seldon deployment custom resource that’s gonna have these two models and it’s just gonna define a 50 50 split between them. So essentially it’s just gonna pull the models from the cloud storage where we put the train models it is gonna deploy them in a Kubernetes cluster and it’s gonna expose that to a user. This looks quite complicated but if you look at the custom resource, it’s very very simple. So essentially you just have a Seldon deployement CR which has two predictors. The first one, podh of them are gonna use the MLFlow inference server. The first one is just gonna point to the weights of model-a which is one of the iterations of our training and the second predictor is just gonna point to the weights of model-b which is the second iteration of our training or a different one. And it’s just gonna define a 50 50 split of traffic between them. And lastly, what we’re gonna do on top of this is adding an inference loop that is gonna, it’s just gonna receive a reward signal. Now, this is something built into Seldon it accepts feedback from its prediction. You can sometimes find that feedback give that feedback back as the with the ground truth but most of the time you’re not gonna have the ground truth. For example here, where’s the ground truth about wine quality? You need that, you wouldn’t need this. So what we’re gonna do we’re gonna build a rough reward signal that is gonna come from the values that the customer says about that wine. So customer is gonna buy a wine which has a score of seven it’s gonna say, well this is actually a five or a four if I’m generous. So we are just gonna take that difference, the inverse because regulator has to be positive a positive signal to say how well our model is performing. And we are just gonna see how these models compare in Grafana So we’re gonna report those metrics we’re gonna see in real time how they perform how they compare. So with that we can jump straight into the notebook and and have a look. Right, so I’ve got the notebook open here. So first of all just wanted to like get them out of the way so there are like a few prerequisites that I just have set up on my environment to save time but essentially you need to have MLFlow install. This is all available in your repo. So feel free to go to to get to it and you can just run it. So we’ve got some some pressing requirements the example also assumes that you have a Kubernetes cluster set up and that you have access to that cluster and that you’ve got Seldon Core installed on it. I’ve linked the instructions here on how you can do this fairly easily. And last but not least you also need a Grafana and Prometheus is installed on that cluster. So in here, I’m just using a helm chart that Seldon Core provides which essentially installs Grafana Prometheus pre-configure style with Seldom Core and and also creates a dashboard for you, a model dashboard for you out of the box. Cool, so first let’s go into training. So in here, I’ve got an MLFlow file defined. This is very similar to the one we saw in the slides before. So it just has this environment it’s called yaml environment. We can look at it and it just has a few dependencies set up. It’s fairly simple. One of them is i scikit-learn with a version nine. Scikit-learn is very is very is very sensible is very sensitive, sorry, to the person particularly when you want to serialize it and deserialize it. So it’s good to have it pinned to a particular person. Now the with this in place, we can just run our training and in here for example I’m just gonna run it setting the alpha hyperparameter of the elasticnet to 0.1 and we can see it runs that this is gonna read that conda yaml is gonna install any kind of missing dependency it’s gonna run our training script and it’s gonna save the output into our local folder in this case. I’ve pre-run a second iteration setting alpha to 1.0. Now we can look locally at how these look like, all these files. So essentially we would have just a bunch of folders it’s one for each iteration of our experiment. As I said before, usually you would have this in a some kind of cloud storage. We can also have run MLFlow UI to kind of see them and here I’m just gonna switch very quickly to a very bright and wide window so sorry about the burn retinas but essentially here yeah you can we can see all of the experiment iterations and we can see that it’s for each one then we’ve got a setting of the hyperparameters. We know the user who trained them and we know the reported metrics. So here in particular in this example we’re gonna get these two to use these two and we’re just gonna deploy them and pitch them side to side in A/B test with production data, we’re gonna simulate production data. We can also look further how this looks like so essentially you just have and it’s one of the yaml model has what as we said flavor so for example python function it has some instructions on how you can read it back as well as some metadata. Now here is the step that you usually wouldn’t need in a production setting which is taking the output of your training of each one of the iterations and the plot and pushing it uploading it to a cloud remote storage cluster. Usually MLFlow would automatically when you log a model would just push it to the cloud or push it to some kind of storage bucket. And why in this case we need this in particular is because Seldon when it serves the model is gonna need access to these weights. So in here I’m just gonna manually push them into a google cloud backet which is just gonna be called like Seldon model MLFlow model a. Now, moving on. The next step is gonna be to deploy our model. So now we have trained them we’ve got them in Google Cloud Storage and we’re going to deploy them. Now for that we just need to define a Seldon deployment custom resource an instance of the classroom resource where we can see where we can see a few things. So first of all it has a name one’s classifier and then it has two predictors. Now the first part of the predictor is very similar to what we saw earlier which is just defining a graph where we are gonna have a single node in this case which is gonna use the MLFlow server implementation and it’s just gonna point to the weight to the URI in model a. And it’s gonna get 50% of the traffic. This is the first particular the model a. We look we can also see here that it has a weird thing, a big chunk here. This is one of the trade-offs of having the dynamic environment that’s on the yaml file. As we will see later when you deploy a model with the MLFlow server, it’s gonna bring that contact yaml which again is stored here remotely in this URI and it’s gonna instantiate that environment for you. Now because this environment is completely dynamic you can’t have anything pre-built. You need to do this at any time for each one of the instances of your model which is a fairly costly process. This is a trade-off of having a dynamic environment. So here essentially what we do is we’re gonna tell Kubernetes well it’s fine that this container in particular takes more time to come up just give it time because it needs to install the environment. So which is essentially what we do here. and we can do that because through the component specs in Seldon deployment CR you’re able to override any kind of pod spec change that you want to have. Now the workaround to this alternative is to actually have a hard-coded environment with a set of dependencies there which you can also do it’s just a trade-off you need to decide which one you want the dynamic environment with slow performance coming up or a hard-coded environment with no dynamicity to dynamicness i don’t know. The second predictor is gonna be on the MLFlow server we’re just gonna point to model b and it’s also gonna get 50% of the traffic. Now we are gonna do this and we’re gonna just apply it. Now we can see what’s happening on our cluster this is a view into our Kubernetes cluster where we can see that there is a Seldon deployment resource getting created. And this is the resource that we just created. We can look at the pods and we can see that it’s creating two different pods one for each of our predictors. So if we look at the first one for example, we can see that first of all is instantiating the sidecar the storage initializer that is gonna, it’s just gonna download our models if we look at the logs and secondly we can see that there is this MLFlow server rest which is just the MLFLow server inference, the MLFlow inference server. If we look at the logs of this container we can see that it’s actually creating the Conda environment it’s reading it from the conda yaml and it’s instantiating everything. Now this takes a while because it’s gonna need to download all the dependencies. So while it’s doing that, you may also be wondering about okay, so we’ve got two pods how does it know that it needs to do a 50 50 traffic split on it? When you do this kind of test, this kind of deployment this kind of inference graph of an A/B test Seldon by default will do that split at the ingress level. The reason for doing it at the ingress level is that then you can leverage something like easter to do that split between these two containers these two pods which is gonna be faster than doing it in a single pod. So if we look for example at the virtual services in our cluster, once they come up, that it will appear. Now because it takes a bit of time I imagine now it must be installing the dependencies I’ve got here in the background is namespace where these pods have already been created and they are just up and we can look at, it has this virtual service already created here. And if we look at any of them we can see that it has these two these 50 50 split between them. Now the next step is to test these models. So we can see that the pods have been created. We can see that we’ve got a Seldon deployment. We can now deploy this we can now send a test request to this. And for this I’m just gonna send it to the backup namespace. So we send a request the request is just going to this endpoint and it just needs to follow this particular schema. And we can see that the output just gets the predicted quality for that particular one with that set of attributes. Now, the next step is gonna be to actually simulate traffic and see how we can implement that feedback loop. Now, for that we brought this simple script which is just gonna simulate traffic coming based on the csv which we used to train our model. So it’s gonna essentially gonna get each one of the rows it’s gonna send it and then it’s gonna send us as reward the sort of a metric based on on the square distance between the predicted value and the actual value and it’s gonna send that to the feedback endpoint. Now, I’m just gonna kick that off in the background here and this script is gonna run forever. And if we go into Grafana, we should be able to see now how this traffic starts to appear here because Grafana takes a bit of time. I think we should be able to go back a bit more and we should be able to see here a previous run where we started to report our rewards for our models. For example here, here you can see the pre-built dashboard for your models. And you can see how it’s reporting the reward for each of them. Besides the reward, it also keeps track of the requests that are getting and in here you should be able to see, I guess Grafana is just taking a bit of time to refresh, how the request rate for each of the models is actually 50 50. And and that’s pretty much it for the demo. Cool, so with that thanks a lot for joining the session. Again just we are hiring. So feel free to reach me and also don’t remember to give feedback about the session any kind of feedback is welcome and also please feel free to ask any questions on the chat. Yeah thanks a lot for your time.
Adrian is a Machine Learning Engineer at Seldon, where his focus is to extend Seldon's open source and enterprise machine learning operation products to solve large scale problems at leading organisations in the Automotive, Pharmaceutical and Technology sectors. When he is not doing that, Adrian loves experimenting with new technologies and catching up with ML papers. Before Seldon, Adrian has worked as a Software Engineer across different startups, where he contributed and led the development of large production codebases. Adrian holds an MSc in Machine Learning from University College London, where he specialised in probabilistic methods applied to healthcare, as well as a MEng in Computer Science from the University of Alicante.