Offers are important sales drivers in the fast-food industry. Being able to segment customers based on their offer preferences and assigning the best offer sets to each segment is critical for a customer-centric recommendation system to enhance user shopping experience.
Directly applying recommendation models to this use case would cause the “black box” problem where marketers can have difficulty understanding the model’s decision logic. It’s also challenging for marketers to apply different offer strategies to different customers. On the other hand, traditional customer segmentation approaches won’t generate offer recommendations and therefore they still rely on marketers to manually assign offers to each user segment.
At Burger King, we have developed our own offer recommendation system that leverages pre-trained BERT and Inception models on Apache Spark to extract feature representations directly from offer descriptions and images, followed by Spark MLlib to form the user segmentations. Such a system can discover customer segmentations based on their offer preference and can also directly generate offer recommendations from these resulting segments.
Furthermore, it is robust enough to handle the constantly changing offer pool and newly joined customers without the need to keep retraining the model from time to time. In this session, we would discuss our offer recommendation system in detail.
Luyang Wang: Hi, everyone. Thanks for listening to our session today. My name is Lu, and I’m the Director of Machine Learning and Data Science at Restaurant Brands International. We are the parent company of three iconic fast food brands: Burger King, Popeye’s, and Tim Horton’s. My team is responsible for all three brands’ machinery and data science initiative across the global markets. I’ve always been a regular attendant of Sparks Summit, and I’m very excited to be presenting our work at Data and AI Summit today.
I will start with a use case introduction and an overview of our DeepFlame Offer Recommendation System. And Kai from Intel will talk about the end-to-end system on big data with Apache Spark.
So offer has always been very important in the quick serving food restaurant industry. It’s always one of the main sales driver. And if you open your Burger King app today, go to the offer section here. That’s where we are presenting all the offers to our guests. So previously, we launched going through the personalization route, so all of the customers seeing the same offers at, so it’s generic. Our marketing team picked the best offer to the guest, but still same set of offer can not fit everyone’s needs. So when you think about it, different customer have different offer needs. Some like party size offers to feed the whole family. Some only are single customers, right? They don’t really need party-size offers, right?
And also, at different locations, they are serving different pricing tier offers. So, for example, in [inaudible], is okay for majority of the market. But if you see the high spending area like New York, it might be that doesn’t makes sense from the pricing point to the local market.
And also, some offer can be a time-sensitive. So, for example, breakfast offer can only redeem during breakfast hours. And we also get some other time-sensitive offers, which requires personalization. So a generic offer set does not really fit in that situation.
So when we think of the offer recommendation, so when we look at the industry, there are two common approaches we see. So first is on the one-to-one recommendation side, so it’s more treating the offer recommendation similar to the product recommendation in the e-commerce industry. So some common approaches are a collaborative filtering and also some deep learning-based approach, like what to eat [inaudible]. It’s all very [inaudible], so treating each customer separately and doing recommendation based on that.
So those challenges there is, of course, is the first thing is that low interpretability. Like differing models, there’s kind of a black box situation. So it’s very challenging to understand why a model make certain recommendations. And also, when you sync up the offer case, why one key thing different from offer recommendation to the product recommendation is in the case of the offer, most likely we need to treat different customer in different situations.
So for example, if the customer do not visit us quite often, we need to sync up some recommendation can drive the repeated frequency. And for other kind of customer, if they already come frequently enough, then we need to think about, can we have them add on to buy other items, or shop during the daytime if they don’t normally [inaudible]. So it’s different multiple goals here. So it’s not a generic goal such as the conversion or add-on sales which makes the one-to-one recommendation so kind of a challenge right there, too. It’s not flexible enough to maximize on different goals.
And also, when we think about collaborative filtering wide and deep, those kind of model, it’s going to create a huge size of user embeddings to manage. So we talking about a million [minutes] of the customer. It’s going to start costing the online inference [inaudible], where you get to big of the size of the user embedding for the system to match.
But when you look at the other side so customer segmentation is also very popular. So the idea behind that is assigning customer to each different segments. So marketers who are subject matter experts can go in, take a look, see each segments, what are their characteristics, and then how do we assign an offer to them differently. Some common approaches are traditional RFM model and various of [inaudible] model, K-Means, K-Mode, DBSCAN, all very popular.
So the challenges for that is assigning user to segments will not necessarily tell the marketers what offer do you assign. So you do get offer assigned to each segments, but still, you need marketers to work with the data team to come up with ideas for each segments, what offer do you assign. And this process can be kind of painful because you deal with manual assignment work, and sometimes it can be too much for the business and the data team to handle to figure out those manual assignments.
So when we sync up the new offer recommendation system, there are a couple of goals in mind. So first is this needs to be interpretable. So it cannot be a black box, as we explained previously. Marketers need to understand what’s behind those offer assignment and then what’s driving those recommendations. So interpretable is very important.
Also, when you do have a robust system, it will keep track of the user’s movement, let’s say, across different segments. So we know, hey, if the user is moving upright in terms of their spending and engagement in the brand, or they’re moving down. And that’s very important to know.
And also, marketers should be able to leverage the system to maximize on different goals to different customers. So for the user lack of frequency, we should be able to leverage the system to drive the frequency. For the user, let’s say, do not visit us during the breakfast hour, we should be able to leverage the system to drive they come back to breakfast. So it needs to be very flexible on that.
The last point is, of course, fast deployment, easy to maintain so it needs to be a system that’s very easy to maintain the production to be able to handle it to a scalable or handle to our all customer groups, so.
And here we come up with our new offer recognition system. We call that DeepFlame. So it’s a new system we in-house build together with a [inaudible] team. So at a high level, there are a couple of different component here.
First component is the BERT. So BERT is the state of art, pre-trained NLP models out there. And in the DeepFlame, we are leveraging BERT to understanding the offer description. So it’s coming from the same perspective from the customer. So we think about when customer looking at our offer at what they see. Obviously, they see first is a description, so a [inaudible] that’s what they see about the offer. So we try to mimic from the customer perspective. So we leverage for to converting the text into machine learnable embeddings for our recommendation model to learn from there.
Similar to that, we also use ResNET too, which is a pre-trained incubation model, because when customers look at our offers, they also see the images, so that’s quite important. So we also leverage the ResNET to converting the images to the embeddings. Again, can be learned by our machine learning model down the line.
And the TXT is the new model we developed. Its full name is called the transformer cross transformer. And we can attach the paper link and code link down here. If you’re interested, you can take a look. We also have other sessions to talk about the TXT online, so this is a new recommendation model we developed here. Essentially, we leverage a new double transformer architecture. So we have one transformer to focusing on the sequence of the ordering and the user purchase history data, and another transformer focusing on the [inaudible], like what is your device, what time you open the app, all those real-time context features. And then, we combine these two using a latent cross to predicting the next items you’re likely to purchase. So again, paper and code link attach here. Definitely check it out.
And last component is the K-Means. So essentially for the K-Means, we are creating customer segments, not based on the offer requirements but based on their behavior data, such as their spend, their primary service mode, average ticket GPM, and their visit frequencies, things like that. So if you look at those features, they are almost very stable, in terms of you don’t change that often. Like spend is something you can keep tracking for years, so that’s very important because building a segmentation based on those data ensures us can be reused the same segments for a long period of time so that we can keep track of the user movement across different segments [inaudible].
And here is how we train the DeepFlame model. So this is a hybrid approach where I combined segmentation and one-to-one personalization, which allows us to [inaudible] expert like digital marketers to easily maintain and modify the offer rules based on segmentation while still allowing the deep learning models to automatically pick the best offers according to the preset offer rules. So if you look at the whole process, again we have two starting point. I’m going to start from the left. So essentially, we started from the offer description and images using two pre-trained models to learn the vectors of the texts and the images. And then, we use transformer encoder to reducing the dimension because those pre-trained models, the dimension is quite high, so we’re doing this [inaudible] step to reducing the dimension to make it to reduce embeddings. Can highlight the difference across the different texts, different images so model can easily learn out of their distinguished features.
And after the [inaudible] step, we generate this yellow [inaudible] where we cause over-embedding, which is combination between BERT and ResNET output, and then we use that to feed into the transformer encoder. So from this point out is our TXT model architecture where we have the context features feeding into a transforming encoder and offer features, so sequence of the historical redeemed offers feeding to another transforming encoder, and then combining to predict the next best offers per each guest.
And if you look at, from the right side, that’s where our segmentation steps come in, so essentially we have those behavior data feeding into the K-Means model so each customer we’re going to assign to like a second psyche. And then, we work with the local marketers to understand, “Hey, what’s the characteristic of different segments?” [inaudible] and [inaudible]. And based on that, they start creating this offer rule set. So they don’t need to tell us, “Hey, what are the 10 plenty offer they want you to assign rank order?” They don’t have to do that. However, they need to provide guidance on some kind of what we call the preset offer rules. They need to find, “Hey, for this segments, what are the number of lapsed offer we need to assign? What number of cross-category offer we need to assign? What a number of high GPM offers we need you to sign?”
So those are high-level rules. It’s easy to come up with, easy to maintain, and the details under there, like what exactly what offer to assign. We leave it to the deep learning model so it’s a combination between the two so it’s flexible for the marketers to control the goals, what kind of behavior we’re trying to drive, use the offer recommendation, but still allows a deep learning model to intelligently select each model under the rules to determine the final offer pool we present to each individual guest. So that’s the model training.
Now when we talk about the model inference, so here again, it needs to be a better robust system, low maintenance, and automatically picking the offer to the guest. So the way it works in production is that we do have this DeepFlame server, which containing multiple database two-component. One is the clustering model. One is our TXT DR recommender. So the whole thing started was when the user logged into the app. We grabbed their user identifier and based on this user identifier, we do some knew some real-time inquiry to customer database, which containing all their purchasing histories- what offer did they redeem before, what purchase they made before. So use those information with generate real-time segments, which associated with the offer rules, which it should be preset.
And same information also feed into the TXT, so we know what offer, what purchase history user has made before. We using those ID [inaudible] offer database to get those embeddings we previously generated and then run inference to determine the old offer set ranked from the highest probability to lowest probability. And then, we combining the offer rule set with the deep learning recommended output, which should be the offer ranking from the best offer to the worst offer to the customer. Combining the two, we generate a final offer set, and then we return this final offer set to the guest, which ensures it’s a combination between the business rules and deep learning models. So it’s a very interpretable, and also you can see the whole process [inaudible] automatically, so it’s very easy to maintain in the production.
So just quick summary. Some key advantage of the whole DeepFlame model offer assigning process. First, it’s an interpretable recommendation system, so the embedded customer segmentation can be used to explain the logic behind the offer assignments. So, for example, marketers can come in and say, “Hey, because this group is not… It’s a lapsed group. They don’t come here quite often. Then we should figure out something like offering strategy should drive the user back.” So they can adding those logic into the offer rule set they created and then have the DL model picking up the best offer on those rule set. So it’s kind maintainable, interpretable.
And then, this system can constantly track customers’ movement across different segments so because behavior segments are only using the user behavior level feature. So it’s okay. So let’s say you have a new offer or new setup user coming this week, it’s only okay. That behavior segment, you do not need to change that often. So it can be used for a while so that we know, hey, if the user moving from a low engagement segments to a high engagement segments, we know that. Same if they like reduce engagement to move down in segments, we also know that. So it’s consistent, and we are able to keep using the same segmentation model for a while to track of the user movement.
And also again, marketers should be able to leverage this system to maximize on different goals. So again, this is done through the offer rules. This says accounts that we’re introducing to this process so that when marketers set up the rules, so it’s high level, but it should be used as a guidance for the final offer rule set, but they don’t really need to go down to the details. They don’t need to pick 10 final offer and keep maintaining that rule. They set up the rules based on certain criteria, and we allow the recommender to pick the best offer on the server. Fast deployment, easy to maintain that- that’s the last point. So we are building this unified pipeline built on a single Xeon cluster using Apache Spark and Analytics Zoo. And Kai from Intel with talk more details on that part. Thank you.
Kai Huang: Okay. Thanks, Lu, for introducing the details of the offer recommendation system at Burger King. Intel and Burger King have been constantly cooperating for a long time. And we have already successfully finished several recommendation use cases, Burger King’s big data clusters.
So for this offer recommendation use case, as you may notice, there are quite a lot of components involved, and entire pipeline is a little bit complicated. So it will be really beneficial to have a unified solution that is efficient and easy to implement and maintain.
Normally when building a recommendation system or building an end-to-end pipeline, the developers would, first of all, finish a prototype on their laptop using some sample data followed by doing some experiments on clusters with history data. And if everything is okay, they will finally deploy distributed data pipeline into the production environment. So for these three steps, we target building a platform that can help the developers easily prototype the pipeline that can directly apply AI models to big data with almost zero code changes when scaling from a laptop to a distributed environment, and the entire pipeline can be seamlessly deployed on production clusters, for example, on a Hadoop or [young] cluster or a Kubernetes cluster.
So Analytics Zoo is the platform that we provide to achieve these targets, and Burger King has leveraged Analytics Zoo to prove their recommendation system. So Analytics Zoo is a software platform for Big Data AI open source by Intel that can seamlessly scale from a single laptop to big data clusters or to the cloud.
With Analytics Zoo, users can directly run TensorFlow, Keras, PyTorch, or MXNet on top of Apache Spark. That is to say, after using Spark for large scale data processing, the data can be directly fed to distributed model training on the same cluster. So on top of the data analytics and AI pipelines, we provide high-level ML flows to automate the tasks for building these end-to-end pipelines.
So vertically, Analytics Zoo provides a bunch of built-in feature engineering operations, deep learning models, and use cases for a variety of common scenarios for users to refer to as out-of-the-box solutions. In addition, Analytics Zoo supports directly loading TensorFlow, Keras, or PyTorch models for distributed training and inference, and therefore users having experience with these popular frameworks can easily apply the deep learning applications, the production data with minimum code changes and little deployment efforts on big data clusters. So here’s the link code Analytics Zoo. If you are interested in our project, you can take a look at it.
Okay. The figure on this page shows the overall architecture of Burger King’s offer recommendation system deploying. So on the top right, we can see that the offline training is conducted on a single young cluster, and Spark is used for ETL and data processing, and a preprocessed customer behavior data is fed into Spark MLlib, and K-Means is used for user segmentation.
So on the other hand, the pre-trained BERT and ResNET models are loaded into Analytics Zoo for distributed feature extraction and offer the descriptions and images respectively. And features extracted will be saved into the KV store for online use. So the feature representations of the offers are combined together and use the auto encoder as the input to the TXT model. And the TXT model has also trained in a distributed fashion to rank the candidate offers.
So the distributed training is conducted on a single cluster where the data is stored and processed, and therefore there is no extra data transfer overhead, and there is no extra efforts needed for maintaining separate workflows or systems. And this solution should be very efficient. And the trained model is saved into the registry for production use.
And for online servings, see the bottom part of here. And for online serving, Burger King use a Kubernetes cluster, and they use the play framework for building web applications. And Analytics Zoo provides a portal-style inference model API for real-time model serving without stack dependencies. And we leveraged the OpenVINO toolkit [inaudible] and [inaudible] techniques to achieve significant performance boost. And when a user request comes, an inference model will run within a docker container to return the candidates of first-tier to users with low latency. So the ClickStream data will be collected and stored into the distributed file system, and they will be used later for retraining the model periodically.
So this is basically the overall architecture of our recommendation system, and the system contains a lot of components. And after introducing Analytics Zoo, the solution becomes unified and efficient, easy to scale and maintain. So such a solution has been proven to be really, really, really efficient and really well organized in the production environment.
So that’s basically what we want to cover in this session. So as a wrap-up, in this session, we introduced Burger King’s offer recommendation system called DeepFlame. And in the recommendation system, we use a hybrid approach to combine customer segmentation and deep learning to provide the best offer recommendations to the customers. And the entire system is built on top of Analytics Zoo, and Analytics Zoo is open source project, and you can easily find us at our GitHub page.
So currently, we are working on a recommendation framework in Zoo Analytics that particular focus on optimized end-to-end recommendation workloads for different scenarios. And we would be glad to share our future progresses in future chances. And okay, thank you for your participation. And if you have any questions, feel free to raise. Thank you so much.
Kai Huang is a software engineer at Intel. His work mainly focuses on developing and supporting deep learning frameworks on Apache Spark. He has successfully helped many enterprise customers work o...
Luyang Wang is a Sr. Manager on the Burger King data science team at Restaurant Brands International, where he works on developing large scale recommendation systems and machine learning services. Pre...