Operationalizing Machine Learning at Scale at Starbucks

Download Slides

As ML-driven innovations are propelled by the Self-Service capabilities in the Enterprise Data and Analytics Platform, teams face a significant entry barrier and productivity issues in moving from POCs to Operating ML-powered apps at scale in production. This talk is the journey of a team in using the Starbucks AI foundational capabilities in EDAP to deploy, manage and operate ML models as secure and scalable cognitive services that have the potential of powering internet-scale inferences for use cases and applications.

Watch more Spark + AI sessions here
Try Databricks for free

Video Transcript

– Welcome to Brewing AI and Machine Learning at Scale at Starbucks. My name is Denny Lee, from Databricks, and I’ll be chatting with Balaji on how they’ve scaled their AI and machine learning environment. So let’s start off with having Balaji introducing himself, Balaji? – Thank you, Denny. My name is Balaji, and I am an engineering manager with data, AI, and ML engineering team at Starbucks, and our team ships the Enterprise Data and Analytics Platform. We at Starbucks, one of the world’s most admired brands, our mission is to inspire and nurture the human spirit, one person, one cup, and one neighborhood at a time. We operate over 32,000 stores worldwide, serving 20 million active customers, on 95 million customer occasions each week. We also support over 2,000,000 sustainable coffee farmers every year. While we are a coffee company at the core, technology and data are key drivers to operate at this scale, and we would be happy to share our AI/ML journey.

– Wait, so, how does AI and data relate to what Starbucks is known for, for coffee, and customer service? – Digital is one of the core focus areas for Starbucks.

Data and Analytics is a critical success factor

The Starbucks mobile app is very popular, and you might have come across our personalization initiatives, like product recommendations, and personalized offers. We have a popular loyalty and rewards program that generates a ton of data for us, and that indeed needs analytics. Store is another critical focus area. Improving in store experience and delivering partner-customer connections in the stores are all key priorities. We have an awesome range of products that are continuously shaped by customer feedback, changing trends, and taste. The theme that is common across all these pillars is data and analytics. There are multiple teams at Starbucks who are working on big ideas and breakthrough innovations. Machine learning at scale is a key enabler. – Oh, this is really interesting. So, I love the fact that you call out the machine learning as a key enabler for Starbucks. So could you provide a little context around this though, in terms of what was your journey from your original legacy systems into the foundation of AI and machine learning that you have? – Sure Denny, let me set the context of EDAP’s journey in the AI/ML space. There are critical initiatives like Deep Brew, next gen personalization, voice ordering, and multiple retail optimization projects out there in the public. Everyone is trying to get their hands on machine learning. As the Enterprise Data and Analytics Platform being, our mission is to democratize analytics and machine learning at scale by providing trusted data, scalable platforms, and self-service. We elevated value to entry for new AI/ML projects and unlocked rapid innovations across the company. A foundational need for any AI/ML project is data. We have the critical mass in EDAP with petabytes of data co-located in subject areas such as customer, partner, and retail. Our cumulative data sets, data products, and real time feeds are all the backbone for critical analytics initiatives across the company. But it is not just about data. We have deployed over 25 instances of BREWKIT playgrounds. These are Databricks powered deep analytics environments. Teams are using our playgrounds to self-serve petabyte scale analytics without compromising on enterprise security and governance controls. Many of our consumers build reinforcement learning models, deep learning models, and classical machine learning applications using the on demand GPU compute, Spark clusters, and open source deep learning and ML frameworks supported by the platform. BREWKIT playgrounds helps teams to focus on creating analytical outcomes and not worry about data and infrastructure. But how do we power digital and in store experiences? Today, it requires custom engineering and deployment to proprietary infrastructure that our team provisions and manages. As adoption for machine learning grows, this is not gonna scale. This also impacts time to market for new ideas and results in model fragmentation. With the AIRESERVE, we are providing an enterprise solution for taking the engineering and management complexities out of deploying models to production. We do that by providing an on demand managed ML infrastructure and automated process flows for models into production with complete traceability. We want our developers across Starbucks to easily discover and consume these models without having to be machine learning experts.

– Whoa, so, this is really cool, actually. So you can have your developers not necessarily be ML experts, they can easily create multiple playgrounds for machine learning, deep learning, so your data teams don’t have to worry about the infrastructure. But can you give me a more concrete example of an initiative that the EDAP platform here actually goes ahead and solves? – Absolutely, Denny, I showcase here this little project. It is Theia. It is our computer vision and object detection proof-of-concept. We distribute this as a reference implementation for onboarding into the AIRESERVE, and that’s what exactly I wanna talk about. Just to touch up on the complexity of this application, we ship multiple object detection models that can detect Starbucks products in images and video streams. We use frameworks like TensorFlow and experiment with other frameworks like Pytorch. The models are based on Faster R-CNN, YOLO v3, and SSD MobileNet. These models were trained a million plus images specific to Starbucks products. The images were annotated using capabilities from open source tooling, and pre-trained models. We also make them available in a format consumable by leading frameworks like TensorFlow.

We have augmented the images to not only enrich our training data, but also to introduce noise that enables the neural nets to learn generalized features. These pretend custom models can then be incorporated into multiple use cases for Starbucks through AIRESERVE. Let’s see a couple of demos for how the customers of these models consume through AIRESERVE.

Consumers of Theia don’t care about the technical complexity of object detection, and they certainly are not machine learning experts. The AIRESERVE addresses this impedance mismatch between the analytics teams and the app developers.

This notebook leverages secure AIRESERVE REST API to detect products in an image.

The first step here retrieves the Azure keyvault that contains the secret to connect to AIRESERVE model endpoint.

We go on to write a few image conversion functions that we use to convert the image into a JSON payload that the REST API needs.

Next we load a sample image in our library and convert it into a pixel array and dump it into a JSON format for sending to the API.

As you see here, the sampling rate is a scone.

Let us see what the AIRESERVE hosted model thinks about it.

In this cell, we invoke the REST endpoint with the image and the auth token corresponding to the subscription.

The model identified the right bounding box around the scone. It also detected this object to be a scone with a high confidence score.

This is an example of how the developer gets to use AIRESERVE models without having to be a machine learning expert.

Here is another example of AIRESERVE serving the custom models on edge infrastructure. This demo app allows you to leverage some of our top-performing object detection models managed by AIRESERVE and deployed to an NVIDIA GPU-enabled edge. And it does autonomous inventory counts in real time.

We have named these models after major brew techniques, and they correspond to models that are based on Faster R-CNN, YOLO v3, SSD, and others, which area deployed to AIRESERVE. These choices help us to arrive at the right prediction accuracy and performance trade-off for this deployment. As you hover over the models, you see the metrics and specs displayed.

For this demo, these models will detect six Starbucks products, cookies, bagels, bread, croissants, muffin, and scones. What you see on the right hand side is the observed area. This could be store shelves, or in this case, a flat surface where the intelligent camera is pointing. As we place a tray of fresh products in front of the camera, the application identifies the objects, and counts them by product category.

The app continues to report the status to the backend, unlocking key capabilities, like restocking alerts, and providing other insights on products.

As items move in and out of the tray, the app continues to track the positions continuously.

This is a classic example of an opportunity for how we can elevate the customer-partner interaction by taking AI and machine learning to the edge, leveraging AIRESERVE. Let’s see how AIRESERVE helps a project like Theia. Theia is constantly evolving. Our engineers are experimenting with new models and frameworks. We are constantly training these models to detect new products, and adding more training data to detect products better. We want to package these models into new use cases and also deploy them to new form factors. The AIRESERVE enables the project team to train, validate, and deploy the models at will and continuously on a hardened ephemeral training infrastructure in a matter of minutes. It helps serve inferences to consumers as the REST API at digital scale and store scale in a secured fashion on an always on platform. It enables deployment at the edge on intelligent devices and serve inferences, and finally it provide end-to-end traceability for the project teams, all of these, without spending months of infrastructure and ML op setup time and days for deploying models into production.

– So wow, you’ve shown how this Theia application is able to go ahead and showcase object detection models and how it automatically solves the problem of infrastructure and deploying at scale.


But how do you make this available to everybody in Starbucks? How does AIRESERVE work for everybody else? – So there is a publisher and consumer experience for AIRESERVE.

As publishers, teams use BREWKIT playground with EDAP for solution experimentation. The solution is version control in Starbucks Git. Then with an addition of an AIRESERVE solution manifest into the Git repo, you automate model deployment to production. Models are continuously trained and tested at scale on an ephemeral model training infrastructure. These models are then deployed and registered into the Starbucks Model Gallery for easy discovery and consumption as REST APIs.

All the metadata assets, including experiments, models, hyperparameters, model efficacy, images, and deployments are freely traced over time.

The deployments are instrumented to enable monitoring and alerting that triggers the retraining process as necessary. Out of the box, the models are deployed on AIRESERVE managed infrastructure. These images can then be seamlessly extended to the edge, as we saw before, as long as the edge supports containers.

– Well, that’s really, really cool. So you’ve got this infrastructure set up. Can you just demonstrate how a publisher onboarding works in that case, so how did they go do this?

AIRESERVE Consumer Experience Discover and consume AI/ML Models without friction

– Absolutely, Denny. Publishing of models is done in three steps. As discussed, the project team prepares the project for onboarding by adding the solution manifest. We have an AIRESERVE SDK published to our community for this purpose. The second step is for the platform team to onboard the project into AIRESERVE through our platform automation. Once the automation runs, the models are continuously built and published into AIRESERVE from that point onwards.

The first step in publishing is preparing the project for onboarding into AIRESERVE. And as we discussed, we add a solution manifest into the Git repo of the project in order to automate the full model flow. Let’s take a quick look into the manifest.json that is checked into this Git repo.

Looking at the manifest, we see we are able to right size the training cluster for this workload through configuration. The configuration here shows that the training will happen on a Spark cluster with GPU instance type. But it would eventually support training on other compute infrastructure, like Kubernetes and other Azure PaaS Services.

The project can also define the resource has for the inference endpoint.

And this would be used to deploy the right configuration in Kubernetes.

The project owner then goes on to define the experiment with a series of models that need to be trained for this project. Each model configuration points to the corresponding entry point script in the payload and the parameters that need to be passed into those scripts at runtime. In turn, the scripts are expected to return a frozen model, extended logs, and performance metrics associated with the model.

The project owner can then define the performance metric that should be used to assess the best model. The model that ranks number one based on this configuration will be promoted to model registry, and eventually as an endpoint.

The project owner also defines the environmental dependencies that need to be in place for the inference to work, and that’s specified in a separate YAML file.

Now with the addition of this specific solution manifest, we’ll be able to onboard this project into AIRESERVE and leverage model flow automation into production. Let’s double click on the second step of the publishing process, onboarding the project into AIRESERVE. We onboard the project into AIRESERVE by leveraging the provisioning automation on the platform side. This process deploys the end-to-end Starbucks hardened ML ops infrastructure and the model flow into AIRESERVE for this project.

A typical provisioning takes about 15 to 20 minutes, so we will take a look at one of the previous runs.

Our solution uses Databricks and ephemeral Spark clusters as a scalable training infrastructure. We provision Databricks workspace with the AIRESERVE bits installed in it. We hardened the Databricks for data accessing to EDAP.

We go on to deploy the Starbucks hardened Kubernetes cluster, our configured attendant in our shared set with this Kubernetes in order to host the inference endpoint. We enforce strict isolation and resource limits as per standards.

We go on to deploy the tracking servers next. We deploy Azure Machine Learning to track the experiments, models, and other artifacts that are produced for this project with every run. This process also sets up the container registry for publishing the images.

The onboarding sets up an API endpoint for the project in the AIRESERVE API management to control policies of access and facilitate AB test.

Finally, the project gets an Azure DevOps Portal and model build and release pipeline based on the AIRESERVE blueprint. This standardizes the model flow into AIRESERVE and production. The build pipeline is triggered as any changes to the underlying Git repo is observed.

So what you get at the end of the onboarding process is an ML ops in a box deployed on demand for this project. It seamlessly integrates and there is a build and release pipeline to continuously build and deploy the model to production.

AIRESERVE Blueprint for the project

This is the AIRESERVE blueprint that got provisioned in the previous step. As the build pipeline kicks off, it prepares the Databricks workspace with the latest framework and project bits. The AIRESERVE driver notebook parses the solution manifest and does the data setup, training, and validation for each model defined in the manifest. on an ephemeral Spark cluster. The serialized models and performance metrics are tracked on the tracking server, which is Azure Machine Learning. The best model is assessed based on the performance criteria defined in the solution manifest and promoted to the model registry. A coding image is built for the frozen model with all necessary instrumentation and uploaded into the AIRESERVE’s container registry. As new images become available in the container registry, the release pipeline is triggered, which deploys the prediction endpoint to the AIRESERVE Kubernetes instance and it exposes securely through API management to the application developers.

The complete flow we discussed now is implemented in the build and release pipeline that was automatically generated by the onboarding process.

– So by leveraging the platform, can you tell us a little bit about the benefits? But before you do that, I mean, what’s coming next for AIRESERVE? – So Denny, we are tying around a couple of new use models for AIRESERVE based on the patterns we have seen across the ecosystem. One of the patterns that is commonly evolving is, you know, can we train the model elsewhere, like in a third party cognitive service, and bring that into AIRESERVE and deploy it for Starbucks consumers? The other model that is evolving is around training the model in the AIRESERVE and deploying elsewhere like in the case of the edge compute another team could be managing. We make sure our implementation is open based on containers, so that we are flexible for different use models. As we speak, our teams are prototyping some of these use models.

Another area that our team is heavily invested in is coming up with a model storefront experience, which is a big bid for us. We wanna be able to provide more self-service management capabilities for both the publishers and consumers on the platform. As a publisher, we want you to have the ability to control your AB test for the models. We want to make sure, you know, you can control your policies and access controls and you can get enough analytics and insights around how your models are being consumed in the marketplace. And as a consumer, making sure, you know, these models are, you know, easily discoverable and easily consumable, without having to be a technical geek. So regarding the value proposition of the AIRESERVE, we have managed to bring down the lead time to get a secure scalable ML infrastructure for the projects from months to minutes. You know, people don’t have to put together custom infrastructure for this. They don’t have to go through enterprise security reviews. All of those things are pre-baked into the platform that we deploy, and that is available on demand. From a mean time to deploy perspective, people used to spend days collaborating across the data scientists, the machine learning engineers, the app developers, to push, you know, models to production. We have essentially brought that down to matter of hours by completely automating the process. So now, teams are able to, you know, deploy multiple versions to production, without having to disrupt application developers.

– Perfect, well thanks very much. We really hope that you’ve enjoyed today’s session, and as you can see, the transition from legacy to AI is not just about the technologies, though we certainly have reviewed a lot of it today. It’s also about the change in the operational infrastructure, the processes, and perspectives that allow Starbucks and many other companies, for that matter, to scale their machine learning and AI.

Watch more Spark + AI sessions here
Try Databricks for free
« back
About Balaji Venkataraman


Balaji R Venkataraman is an Engineering Manager with the Enterprise Data And Analytic Platform team at Starbucks. His team ships and operates on-demand platforms on the Azure cloud that powers petabyte scale Data Engineering, at scale ML/AI development and Operationalization across Starbucks. These offerings shape multiple Next generation personalization and retail optimization initiatives.

Denny Lee
About Denny Lee


Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.