Amid the increasingly competitive brewing industry, the ability of retailers and brewers to provide optimal product assortments for their consumers has become a key goal for business stakeholders. Consumer trends, regional heterogeneities and massive product portfolios combine to scale the complexity of assortment selection. At AB InBev, we approach this selection problem through a two-step method rooted in statistical learning techniques. First, regression models and collaborative filtering are used to predict product demand in partnering retailers. The second step involves robust optimization techniques to recommend a set of products that enhance business-specified performance indicators, including retailer revenue and product market share.
With the ultimate goal of scaling our approach to over 100k brick-and-mortar retailers across the United States and online platforms, we have implemented our algorithms in custom-built Python libraries using Apache Spark. We package and deploy production versions of Python wheels to a hosted repository for installation to production infrastructure.
To orchestrate the execution of these processes at scale, we use a combination of the Databricks API, Azure App Configuration, Azure Functions, Azure Event Grid and some custom-built utilities to deploy the production wheels to on-demand and interactive Databricks clusters. From there, we monitor execution with Azure Application Insights and log evaluation metrics to Databricks Delta tables on ADLS. To create a full-fledged product and deliver value to customers, we built a custom web application using React and GraphQL which allows users to request assortment recommendations in a self-service, ad-hoc fashion.
Justin Morse: Hi, everyone. Thank you all for coming today, we are very excited to share the work we’re doing at Anheuser-Busch on building a Product Assortment Recommendation Engine for Brick-and-Mortar Retailers. We’ll start with some introductions. I’m Justin Morse. I’m a staff data scientist at Anheuser-Busch where I lead a team of data scientists and developing product recommendation engines. I’ve been at the company for two years. And before that I was in academia where I was solving problems in the biomedical space.
Ethan DuBois: Hey everybody, my name is Ethan DuBois. I’m a senior software engineer here at ABInBev, and I lead a team of software engineers that work together with the data science team and Justin to deploy and productionize machine learning solutions and products for use by our internal stakeholders. And I’ve been here at AB for about a year full time. I was previously a senior engineer at Insight Digital Innovation, where I worked in the data and analytics space across a number of different industries. So I’m happy to be here.
Justin Morse: So today, Ethan and I will be discussing our team’s journey and trying to answer one of the most fundamental questions of our industry, which is the question of, “which products retailers should carry in order to maximize their profit?” So I’ll touch a little bit on the data science solution that we’ve implemented to solve this problem. And then Ethan will pick up and discuss how his team, as a team of software engineers, went about deployment solution in the Azure Ecosystem. And finally, we’ll share just a few of the lessons learned among the way.
Anheuser-Busch is a brewery that’s based in the United States. Together with our parent company, we sell over 2000 unique products. And many of you are probably familiar with our brands, which include Budweiser, Stella Artois, and Michelob Ultra. In North America, Anheuser-Busch sells to over 150,000 different retailers and compensating, a large variety of formats, including supermarkets, convenience stores, liquor stores, but also your neighborhood bars and restaurants.
So Ethan and I sit on the solutions team within the company whose mission is to identify where technology can be applied to the bank at the business. Our solutions team is distributed across North America with hubs in St. Louis, New York and Toronto. So for the past few years now, we’ve been working to identify areas in our day-to-day business operations that can be augmented by machine learning and data science. And that endeavor began in earnest in 2018 with the launch of our data science platform called Lola, which at the time relied mostly on in-house infrastructure for compute part. Like many of you here, though, we quickly realize the benefit of switching to cloud infrastructure and switched to using the Azure Ecosystem, including Databricks.
So we have also heavily invested in machine learning, research and development. We’ve partnered with groups at MIT to address some of the most challenging questions that our business has been facing. And in 2019, we began to collaboration on a product recommendation engine that we’ll be talking about in today’s session. And finally, within the past year, we’ve set up a new organization within the company called PureTech centered around the goal of delivering business value through machine learning and data science. And as you know, this morning, we currently number around 73 employees.
So the data science team was first approached by the business in early 2019 with what was then a deceptively simple question, which product should a retailer carry in order to maximize their bottom line? That is to say, which products should a retailer actually display on their shelves and sell on their store to customers? So at the time, product selection was driven mostly through a few business KPIs, anything through word of mouth recommendations between our sales representatives and retail owners. And when product portfolios were small, this was a completely acceptable solution, totally sufficient for the needs of our business.
However, with the recent explosion in product portfolio sizes due mostly to the proliferation of craft brands and non-traditional alcoholic beverages seltzers, this product arriving at a product assortment is actually extremely complex. And this is what I’m trying to show here on the right hand figure, where if we assume that the average retailer carries about 100 products and can choose from a portfolio size of about a thousand products, there are actually more than 10 to the hundred different ways that a retailer can design their assortment that they carry in the store.
And of course this total number combination scales, combinatorially with the size of the retailer, the number of items it can carry, but also the size of their product portfolio. And I’ll just remind you that Anheuser-Busch alone sells over 2000 new products, and that’s not counting the number of products that are offered by our competitors. So this is an extremely difficult problem for basically all the retailers that we sell to.
So intuitively, we know that within this search space of 10 to the hundred combinations of assortments, most of them will probably fare worse than what currently isn’t sold in the store. However, it stands to reason that, at least one of those assortments, maybe even more than one may not only increase retailer performance from the business perspective, but might be more satisfying to our customers.
So our question then became how can we develop a quantitative approach to assortment planning that accounts for customer preferences, business priorities, and also the computational complexity of the problem? So in working with our academic partners and our data engineers and the software engineers, the data science team managers, Anheuser-Busch has landed on a multi-step pipeline for assortment finding. Our pipeline begins with a data layer where we ingest multiple data sets related to product sales, product attributes, and retail attributes. That data gets transformed and is fed into the second step of our pipeline where regenerate, exhaustive selection probabilities for the entire combination of stores and products within a product portfolio. Those selection probabilities are then fed into the next step, which we call a Server Optimization, where they’re combined with information about product pricing and also business logic in order to arrive at a numerically optimal product assortment for each retailer within our project scope.
And then finally we measure the real world performance of our modeling through randomized controlled studies, otherwise known as AB tests. So to generate these probability scores that I mentioned for each product in store combination, we use a family of models that’s popular in the world of econometrics called Discrete Choice Models. The simplest of these models is a multinomial regression, which can be implemented in something like SKlearn. But after playing around with that permission, we realized we might get better model performance through a more parametized model and a slightly more complicated model called a Cross list and logic or C&L. And I’ll be happy to address the differences of C&L versus M&L in our Q&A. So since there were no standard packages for C&L that were immediately compatible with our tech stack, we decided to build a custom package of the discrete model using PyTorch.
So that includes C&L, but also M&L. And the architecture for our M&L model looks very similar to the figure in the side here where our first layer corresponds to a feature layer. And the second to last layer here corresponds to a layer that will provide us the probability that the product was selected and in any given store. So until very recently, our model training pipeline was conducted in that in notebooks run manual on Databricks and Ethan will touch on how we’ve been able to scale and deploy this productory Pipeline in a more robust way. And the last thing I’d like to note here is that because our data sets are so large, we’re considering the entire sales transactions throughout the US, we are looking at, equally looking at ways to be able to distribute our model training pipelines, using tools like Petastorm.
And I’ll just briefly touch on our optimization pipeline here. So after we regenerate these search and probabilities, like I mentioned, we can combine those predictions with data related to product margins or unit price, to generate an objective function that can then be optimized using the standard numerical techniques. And I’d like to emphasize that this is the step where we closely work with our business stakeholders to build in logic concerning which products can or cannot be sold in given stores. And this business logic or filters can look something like the size restrictions, i.e, certain products may not be able to be sold in certain formats of stores because those products are too large. They can look like capacity restrictions of the stores themselves, or it can be restrictions around, license agreements, things of that nature.
So at the end of this process of optimization, we ended up with a high-spec data frame that contains an ideal product assortment for each store within our project scope. And that data frame can then be fed into any or front-end that our company may use for certain planning. And as I mentioned, we assess the performance of our models through AB testing in brick-and mortar-stores. We, in fact, have just been in a small scale pilot in Ontario, where we examined the effect of our modeling intervention in a hundred stores, and based on some very encouraging results, not pilot, we have decided to expand into greater North America and we’ll be working, we have been working with Ethan’s team to scale our solution to enable that launch.
And finally, I would like to share with you how we actually went about implementing our pipeline for the pilot. So you’ll see here a screenshot of our Databricks environment with our workspace and within our workspace, you can see multiple notebooks that correspond to different pieces of functionality within our pipeline. In order to run the pipeline, from end to end, and we ended up chaining these notebooks together and that sort of execution notebook and manually trigger this execution of book on a regular cadence and the case of this pilot, it was on a monthly basis. So now I’m going to hand it off to Ethan, who will talk about how his team of software engineers, it took this notebook style solution of new chain notebooks running in a single execution, I have here and turned it into a fully fledged service.
Ethan DuBois: All right. Yeah. Thanks Justin, for that great explanation of the problem that we’ve been trying to solve. So, as Justin mentioned, after a number of successful pilots, we now were tasked with building a more robust solution that included a few things at minimum. Production quality code standards in particular, an object-oriented approach with some thought toward co-design extensibility and re usability. Best practice code distribution, so using version control, repository based automated CIC pipelines to distribute our code. We wanted to take a more flexible and lightweight configuration approach so that we could really try and decouple the configuration of the pipeline from the actual code, and as well as have decoupled communication between the components of the service that we were building. We wanted to take an infrastructure-as-code approach so that we could scale things up and down as necessary to meet demand more easily.
And finally, most importantly, we had to create some sort of API and expose it so that we can integrate our service with other applications. And ultimately this would look like a service that could take an ad hoc request for a sorbent recommendations for a list of stores and then deliver back results for product recommendations. Our ultimate goal here was to basically take this pipeline, productionize it, orchestrate it using best practice approaches and specifically introducing more of us concepts, like I said, infrastructure-as-code, automated deployment and serverless product texture. So these three different pieces just kind of fall into three main categories, co-design configuration and orchestration. So I’m going to just walk through our approach to each of these today, and I’m going to move pretty quickly. First, just to kick it off, here’s a high level overview of the technologies we ended up using in each of these areas. For code, GitHub.
All our version control, CACD is done in GitHub. And for distribution, we have an organizational JFrog Artifactory with a Pypi repository from which we can install all of our custom built Python wheels. For configuration, we chose to use Azure App Configuration and Key Vaults because we’re already deep in the Azure space. As Justin mentioned, using Azure Databricks made a lot of sense to consider a config technologies in the Azure cloud. And then for orchestration, obviously Databricks as our core compute engine in particular using the Databricks API to kick off jobs. And then we also introduced some serverless components like Azure event grid and Azure function apps, as well as Azure app insights for logging. So let’s take into our approach on each of these.
So let’s talk about co-design first. Before we even thought about orchestrating this, we had to do a little refactoring. The chain notebook approach didn’t provide us the ease of maintenance and visibility that we really wanted. There are so many notebooks, it’s easy to get lost, and there was a lot of added complexity there with the chaining of the notebooks and the dependent notebooks, so it was difficult to standardize all those to help control code quality and to onboard new team members. So we refactored this into what we call the Process-controlled Python Pipelines.
I mentioned already, we took an auditory approach here. So we extracted that core functionality from our demand estimation machine learning code and the assortment optimization code into their own classes. And then we generalized a number of those machine learning models and implementations into some shared tools and utilities that could be imported from the pipeline itself and used as we built those pipelines. We also implemented a utilities module that was under development in parallel with the work on our pipelines that helped us standardize our approach to reading and writing data, file IO, logging, configuration, et cetera, so we can standardize that across the entire pipelines.
So now we have the code in this well-structured object oriented format. So it was much more easy to package the code and to distribute it into our GFR repository. We’ll talk more about GitHub a little bit later. So I’ll just show a few shots of what the code looks like. This is an example code from the optimization pipeline itself, and you can see we’re importing a number of different internal shared modules up top that include that functionality around data post and pre-processing, as well as our core assortment optimization module.
This next slide is an example of this execute process method that we’re using to run the pipeline from end to end. And I won’t get too deep into this, but notice at the bottom here, we’re reading and some data, so we’re reading in some utility data from our demand estimates to be in the optimization process, and we’re using this internal utility that we created that wraps the PI spark sparked out read method. Finally, at the end of this, this pipeline, this is where we write out our assortment optimization output. And you can see we’re doing some profiling around here to try to understand how long the code is running, which parts are taking a long time but the main idea is we’re running our results back out into our Delta data lake. So let’s move on to packaging and deployment here. So now that we have this code refactor a little bit from a quality and process standpoint, we could actually start distributing the code as custom Python wheels to Frog for installation our production resources.
As I mentioned before, we use a combination of GitHub CI/CD workflows and get up actions as well as Artifactory to do that. So when a PR was merged, someone made some changes, the release pipeline would kick off, increment the semantic version, upload that new version of the wheel to the repository, and then it could get installed. And right here is just an example of one of these CI/CD pipelines. It’s essentially just a Yaml file using GitHub actions. And I’ll just point out at the very end here, it uses the Frog CLI to upload the wheel. And I’ll point out that, and I mentioned this in the last slide, but I’ll point out that we’re hoping to actually move to using GitHub packages eventually, but they don’t yet support Pipy. So when they do, we’ll hopefully be able to get rid of Frog and use something a little more simple, like GitHub packages.
Of course, for now we’re in Jfrog, and this is just a quick screenshot of what our Frog Artifactory repository looks like. And you can see each of those semantic versions of the code is pushed up and it’s just sitting there waiting to be installed.
So let’s move on now to configuration. We’ve got the code refactored, now we needed a way to take the code and actually decouple it from the configuration. So we didn’t have to create a new build of the code every time we changed some sort of setting. And we had previously used adjacent file just inside the repository to configure everything. But that meant, of course, that we had to do a lot of different releases since the JSN file was included, like as the code was packaged. Nordock accomplished that we use Azure app configuration, which is a universal hosted parameter store in Azure.
And in there we included service level configs, things like algorithmic constants, other machine learning, settings, ranges, thresholds, et cetera, as well as execution level configs, which were things like file names, storage locations, logging, cluster configuration, et cetera. Then for secret storage, we used as you expect to Azure key vault, all of our keys and connections strings to our Azure resources, including our data lake event grid, et cetera. And what was really handy is that we had the ability to back a Databrick secret scope with that key vault, which gave us some easy access from within the code itself, as well as init scripts and spark environment variables that could access those secrets.
Here’s a quick example of what that config looked like. It was simply just adjacent file that lives in the repository here. And so, oops slides there, but we’ve got some logging and some storage information here around our Azure data link store, we’re setting log levels. We have an app insights logging handler that we’re referencing and notice that we’re mentioning the use of Databricks, including even some Databricks spark con attributes as well. And here’s an example of how we actually configure the Databricks cluster as code. And if you’ve worked with the Databricks API before you’re familiar with this, probably because we’re essentially just creating a new on-demand cluster here, we’re specifying the runtime version, the number of nodes, a number of environment variables and some minute scripts. And this is how we were able to store our cluster configuration as code in a hosted store outside of the actual code wheels themselves.
I’m going to skip over this next slide. Here’s a quick look at our Azure app config. This is the actual resource in Azure in you’ll see we have a number of different configurations here that correspond to in particular, I’ll point out the demand estimation and optimization configs as separate settings, and we actually can use labels to denote different environments. So you can extract configurations based on a prouder, a devil label, depending on which environment the actual code is running in. And this approach also adds flexibility, as we think about scaling to universes of different demand estimates, think retailers or geographic regions. So with these building blocks, we’re able to actually create a highly customizable, lightweight configuration approach that really abstracted the majority of our configs away from the actual package code. And I’ll point out one more thing that we actually were able to set up some GitHub workflows to actually sync these JSN files automatically with our Azure app config, so that when PRS were merged, depending on which environment you can actually update changes automatically with the deployed configs in AppConfig.
So finally, let’s move on to orchestration. This is the part that I’m the most excited about, and that I had the most fun building. But now that we had solutions in place for our co-design and how we are approaching configuration, we had to figure out how to actually deploy and orchestrate the execution of these pipelines. In particular, like as a result of ad hoc requests from some calling service, and that included, how do we kick off the job? Where, and how do we run the job and how do we communicate? How do we set up the communication between the different components of the service in an asynchronous fashion? Because, we knew some of these jobs are going to be pretty long running. So to expose the service as an API, we chose to use Azure functions, which is an event driven serverless compute platform that’s intended to solve exactly these types of problems.
So we developed and deployed a couple of simple Python functions that would submit or potentially cancel a Databricks job. And they use some internity utilities to basically wrap the Databricks API via the Python SDK. So with the optimization and the demand estimate estimation machine learning code structured as discrete processes like we showed earlier inside these Python wheels, we could just run a simple controller script that was submitted along with the API request to kick off the job to Databricks, passing through any command line parameters that we would need to run there, and how that XQ on an on-demand cluster as configured inside our Azure app config. So this is the way we chose to execute the jobs and kick the off. So we finally just had to figure out a way for this running job to communicate with potentially the call in process that kicked off the job, as well as any other subscriber that we might want information about how far along the job was, or if there were any errors or any status updates.
So we chose to use Azure Event Grid, which is if you’re not familiar, it’s a single service in Azure that manages routing of events from any source to any destination. And it’s organized around the concept of topics, which are essentially logical channels that can be published or subscribed to, and it’s inherently asynchronous in that all of the communication is done by messages that are sent between subscribers. So no individual component is like asking or pulling for information on some scheduled basis to figure out if some other pieces done, et cetera, it’s all asynchronous. So the way we had this setup at the end was that we were receiving events to kick off recommendation jobs, and we were publishing events to report errors and statuses and other information. So we had this really neat connection between the different components of the service that was all happening asynchronously.
So let’s take a quick, deeper dive here, and then we’ll take a look at, to kind of zoom out and look at the overall diagram. Here’s a short example of one of the HTTP Azure functions we wrote. It’s just Python. It’s very simple. And you can see it accepts an HTTP requests coming off of the end point. And then some of those input parameters get parsed out and eventually those get passed through to the Databricks job that gets kicked off, which I’ll show here. And if you’ve used on-demand clusters or the run of API or anything, you’ll be familiar with this page, but this is an example of one of the actual runs that we’ve done.
And I’ll point out, the arguments here that are command line are really those arguments coming in off of the end point that say, here’s the story we want to run, here’s some products we want to make sure we honor or remove. And anything else that would be required could be passed through on that end point. And then there’s also some dependencies here, and this is where our code here, our utils library, our tools library, and then our assortment code, which is called Shelf here are all included in Azure app config and specified that those need to be installed at runtime on the on-demand cluster.
We also use app insights for logging, and so we’ve just got thousands of trace messages and information coming out of both the Azure functions and from the machine learning code that are getting pushed to app insights for further analysis. So things around data quality metrics, model evaluation metrics for demand estimates, anything like that, we’re currently just pushing out to app insights, and we’re going to continue to figure out how to actually leverage some of those in a more programmatic fashion. But to just put this all together, And I know this has been a lot, but here’s a high level overview of the way the service works as it exists with a specific focus on how these different components are communicating with one another.
And I’ll point out in the upper left here, there’s a web app, like a web tool here in a Postgres database, that’s how we chose to create our first version of the product, which was to deploy the service and then have a web tool, be the actual window through which a user could actually could interact with the service. But we may change that or we may potentially call the service in a batch fashion from some other way, and that was the intention to do. But in this case, once a job has to be run, the web app will publish a start optimization event, which gets pushed onto our vendor topic.
And over here on the right, our Azure function will kick off the optimization job from there and pick up that job at the parameters and kick off a job on an on-demand cluster in Databricks. And from there, the pipeline runs and it’s actually publishing its own events back to that event grid topic from within the Python code, such as I’m running, that was a valid request, that was an invalid request, I had an error, et cetera. And an invalid request could be something like, we don’t have data for this store so we can’t run the optimization because we don’t have the demand estimates or something along those lines.
And then notice that that job is reading and writing from our Azure data lake store here, which Justin mentioned, our Delta layer here is providing all the demand estimates and input data for the service. Then finally visit data factory pipeline and Azure, which kicks off and does a little bit of post-processing in ETL and the results to deliver them back out to be surfaced to the application. But this here is really what I’m most excited about, and it was most excited about sharing because our goal is not just to scale and deploy this machine learning pipeline, but we did do that, but we wanted to create a solution that really adopted real best practice approaches to orchestration, particularly around like using highly configurable on-demand compute, serverless cloud components, serverless architecture. And we were able to do that using app config event grid, and as your functions, along with the Databricks API.
So we’re really excited about this pattern and excited to share it. Our hope is to continue to iterate on this, to make it even more robust and I would encourage anyone who’s looking to just scale and productionize something like this, to consider these types of concepts and technologies as you plan and architect these. But just to conclude, to bring these to a close, we do have an MVP here that’s released, that’s in production. Users are using it and giving us lots of initial feedback so we can continue to iterate. And we learned a lot of lessons. I’ll just focus in on one of these lessons, which, which we learned around the development process. And we had a number of developers who were either developing locally on their own IDs as well as using the workspace, using notebooks.
And for those developing locally, they got all kinds of great things like the bugging and other tools like that, that they loved. But the process of actually packaging up and installing that code on clusters was tedious, and it was just error prone. So what we were able to do is to accommodate them, we built some additional GitHub workflows, which would build and deploy beta versions of the code to our Frog repository and make those much more easily installable on Linux, like on our Databricks environments, and so we were able to accommodate those, but just a lesson learned around different people, developing different ways.
Quick look at the future roadmap, we would like to look at increasing parallelization and distribution for both our model training and the optimization process itself. We want to add some additional intelligence throughout the service around job progress, ETA’s, how long is it going to take to complete, and take a little bit more of an enhanced dev ops approach to a cloud resource deployment infrastructure as code with Terraform and some other things. We’re also, I will mention looking at using the Databricks container service to think about how we’re managing our dependencies and managing our Databricks environment even more in an even more fine-grained away.
And with that, we’d just like to thank the all the members of our team that contributed to this project. It’s been just fantastic to work with such dedicated, intelligent colleagues. And we’d like to thank everyone for attending today’s session. We’re happy to answer any questions around how we’ve implemented this and our data science approach in the chat. So thanks so much.
Ethan DuBois is a Senior Software Engineer on the BeerTech team at AB InBev. He works with a core group of engineers to build and deploy scalable software products that drive business decisions and en...
Justin Morse is a Staff Data Scientist on the technology team at Anheuser Busch. He leads a data science team that develops and deploys recommendation engines to improve product selection by retailers...