For fast food recommendation use cases, user behavior sequences and context features (such as time, weather, and location) are both important factors to be taken into consideration. At Burger King, we have developed a new state-of-the-art recommendation model called Transformer Cross Transformer (TxT). It applies Transformer encoders to capture both user behavior sequences and complicated context features and combines both transformers through the latent cross for joint context-aware fast food recommendations. Online A/B testings show not only the superiority of TxT comparing to existing methods results but also TxT can be successfully applied to other fast food recommendation use cases outside of Burger King.
In addition, we have built an end-to-end recommendation system leveraging Ray (https://github.com/ray-project/ray), Apache Spark and Apache MXNet, which integrates data processing (with Spark) and distributed training (with MXNet and Ray) into a unified data analytics and AI pipeline, running on the same cluster where our big data is stored and processed. Such a unified system has been proven to be efficient, scalable, and easy to maintain in the production environment.
In this session, we would elaborate on our model topology and the architecture of our end-to-end recommendation system in detail. We are also going to share our practical experience in successfully building such a recommendation system on big data platforms.
Speakers: Kai Huang and Luyang Wang
– Okay. Thanks Lu, for sharing the recommendation use case at Burger King and the TxT model architecture for context recommendation. Since Burger King processes an enormous gas transaction data on their big data clusters, it will be essential to build a unified system for the entire recommendation pipeline. So in remaining part of the session, I will elaborate how we use Ray on Apache Spark to implement the distributed training pipeline for the TxT model. Well, first of all, I will briefly introduce the works that Intel has done this year enabling the latest AI technologies on big data. In 2016, we open sourced our first project named BigDL, which is Distributed Deep Learning Framework power plate built on top of Apache Spark. Deep learning applications, retaining BigDL are standard stack applications, and therefore can directly run on existing hardware postbac clusters without modifications to the cluster. We have downloads of optimizations so that BigDL can achieve both outstanding performance and good scalability on the CPU clusters. Later in 2018, with other open sourced Analytics Zoo, a unified data analytics and AI platform for distributing TensorFlow, Keras and PyTorch. With Analytics Zoo, TensorFlow or PyTorch users can seamlessly scale their AI workloads on big clusters for their production data. We provide high level machine learning workflows in Analytics Zoo, including cluster serving and auto ML, to automate the process of building large scale AI applications. There are a bunch built-in models and out-of-the-box solutions in Analytics Zoo for many common fields, including recommendation, time series, forecasting, computer vision, natural language processing, etc. Recently, we developed a step project in Analytics Zoo called Project Orca. And Project Orca is a stipulated time to easily scale out single node Python AI applications on notebooks across large clusters by providing data-parallel pre-processing for common Python libraries. And Sklearn-style APIs for distributed training and inference. So we use the distributed MXNet training support in Project Orca for Burger Kings recommendation system, which was implemented based on Ray and Spark. Also, in case some of you may not be familiar with Ray, I will talk about Ray for a little bit before I proceed. So Ray is a fast and simple framework open sourced by UC Berkeley RISELab to easily build and run emerging AI applications in a distributed fashion. So Ray Core provides simple primitive and friendly interface to help users easily achieve parallelism with remote functions and actors. Ray is packaged with several high-level libraries built on top of Ray Core to accelerate machine learning workloads. For example, Ray Tune can be used for large scale experiment execution and parameter tuning. RLlib provides a unified interface for a variety of scalable Reinforcement Learning applications. So the third one, RaySGD is our focus today. RaySGD implements Wrappers around TensorFlow or PyTorch to ease the deployment of data parallel distributed training. If users want to directly use the native distributed modules of TensorFlow or PyTorch, they might go through the complicated startup steps in the production environment. So the RaySGD can help to relieve the deployment efforts to some extent. So we adopted design idea of RaySGD for MXNet, and we have done some further work to make our implementation fit the Spark data processing pipeline. Okay. So let’s go into the implementation details of the Distributed Training Pipeline on Big Data. We developed RayOnSpark to seamlessly integrate Ray applications into Spark data processing pipelines. So as the name indicates, RayOnSpark can run right on top of PySpark on Burger King’s YARN cluster. So first of all, for the environment preparation, we leverage conduct pack and YARN distributed cash to automatically package and distribute all the passing dependencies on the driver note across the cluster at runtime so that users don’t need to install them on all nodes before hand. And the cluster environment remains clean after the programs finish. So in the Spark setting as we all know, a Spark context is created on the driver node and Spark is used by Burger King to load data and perform data cleaning, ETL and pre-processing steps. And afterwards, RayOnSpark will create a Ray context object on the Spark driver as well. And the Ray context would utilize the existing Spark context to automatically launch Spark processes alongside Spark executors under the same cluster. Additionally, we can see in the figure that a Ray manager would be created within each Spark executer to automatically shut down the Ray processes and release the corresponding resources after the Ray applications exit. Similar to RaySGD, we implement a lightweight wrapper layer around the native MXNet modules to help handle the distributed standings of a distributed MXNet training on the YARN cluster. So in our implementation, MXNet workers and parameter servers all work as reactors. They communicate with each other all through the distributed key values that were natively provided by MXNet. So implementation as you can see in the figure, we have Spark and Ray in the same cluster. And therefore, Spark in memory RDDs or data frames can be streamed into Ray’s plasma object store. And each MXNet worker can take the data partition of the spark RDD from its local plasma object store of the model training. So for the coding part, Project Orca provides a thinking learning style estimator API for distributed MXNet training based on RayOnSpark. But for users, actually, they do not need to know much about the details of RayOnSpark. They just need to import the corresponding packages in Project Orca, and coordinate Orca context for the cluster setup. So when you coordinate Orca context, you specify the cluster mode to be YARN, and you can also specify the amount of resources to be allocated for your application. For example, the number of nodes to use in the cluster, the number of cores, and the amount of memory per node to use, etc. So in an Orca context, we would have prepared a right-hand passing environment and launch the Spark and Ray cluster and YARN for you. So after creating a Spark context, you can use the created Spark context to do data processing using spark. And finally, you can get the Spark RDDs for training and validation. So the third step is to create the estimator for MXNet in Project Orca. So when created an excellent estimator, in the train config, you can specify number of MXNet workers or parameter servers to launch. The code modifications and learning efforts should be minimal if you use Project Orca to scale out single node MXNet training script to a large clusters, since, as you can see in the code, you can directly input the model, loss, matrix, defining pure MXNet when initiating the MXNet estimator. So here in our use case, we input TxT model, define MXNet. And since our recommendation use case is a next item prediction problem, we use the soft mass cross entropy loss, and we choose Top1, Top3 accuracy as the matrix. So invoking estimator.fit would launch the distributed MXNet training across the underlying YARN cluster, given this Spark process RDDs and number of epochs and batch size for training. So that’s pretty much the code you need to write for the distributed MXNet training pipeline. So it should be straightforward and such a distributed training are exactly the same cluster where this data is stored and processed. So there is no extra data transfer needed. Previously, as Lu mentioned, when they use a separate GPU cluster for model training, nearly 20% of the total time is spent on the data transfer between two separate clusters, which is quite expensive. So after switching to the solution provided by Project Orca, Analytics Zoo, the entire pipeline becomes more efficient, scalable, and easier to maintain since it only needs a single cluster. And there’s no extra labor needed to maintain a separate cluster. So Burger King has successfully deployed a pipeline into their production environment to serve the customers. Okay. So here it comes to the end of this session. And as a wrap up in this session, we talk about the joint work of breaking Intel to build end-to-end Context-Aware Fast Recommendation using RayOnSpark. So we have released a paper and a blog for our cooperation with the links shown here. And you may take a look at it later. And if you want to know more about the technical details of RayOnSpark, we have a previous session in the past Spark AI summit this June in this year, and to which particularly talks about implementation details, and you may take a look. So if you want to know more information about Analytics Zoo, you can visit our GitHub page. And I think that the other functionalities of Analytics Zoo would be useful to your work as well. So if you have a GitHub account, don’t hesitate to star our project, Analytics Zoo on the GitHub page, so that you can find us whenever in need, and we can give you timely help and support. Also, we are now actively working with other industrial customers for more use cases with RayOnSpark, and we’ll be glad to share our progress in future chances. Okay. That’s pretty much what I want you to talk today. Feel free to raise questions if any, and thank you for attending this session. And hope that what we have talked about could arouse attention. Thank you so much, and have a good day.
Kai Huang is a software engineer at Intel. His work mainly focuses on developing and supporting deep learning frameworks on Apache Spark. He has successfully helped many enterprise customers work out optimized end-to-end data analytics and AI solutions on big data platforms. He is a main contributor to open source big data + AI projects Analytics Zoo (https://github.com/intel-analytics/analytics-zoo) and BigDL(https://github.com/intel-analytics/BigDL).
Burger King Corporation
Luyang Wang is a Sr. Manager on the Burger King data science team at Restaurant Brands International, where he works on developing large scale recommendation systems and machine learning services. Previously, Luyang Wang was working at the AI lab at Philips and Office Depot.