The SMART Forecasting team at Walmart Labs has built an innovative, cloud-agnostic, scalable platform to improve Walmart’s ability to predict customer demand while improving item in-stocks and reducing food waste. Over a period of two years, all of Walmart’s key departments in the US, Canada and Mexico have adopted our forecasting solution with planned extensions to other Walmart operated international markets. Over 100M store-item combinations are forecasted every week for the next 52 weeks. We continue to enhance our modelling suite for COVID impact, pricing in international markets, and weekend sales corrections. We will present a general overview of our scaled forecasting solution and follow it by a concrete use case for in week adjustments which provides consistent business value for produce and is currently in the process of being scaled out to more Walmart departments.
Divya Hindupur: Hello everyone. I am Divya Hindupur. I’m a data scientist in the smart forecasting team at Walmart. I have Jay Kakkar with me, who is a software engineer and my teammate in the smart forecasting team at Walmart.
Welcome to our session. Today, we are going to talk about weekday demand sensing at Walmart. So let’s begin with a quick introduction of Walmart. We are the largest grocer in the US, and it takes 2.3 million employees to keep this gigantic business a success. We are number one Fortune 500 company with other $500 billion in annual revenue. With 11,300 stores worldwide, we have a deep geographic penetration. More than 90% of US population lives within 10 miles of a Walmart store. We are a massive retail business, and one of our biggest technical challenges, scalability of our models and data pipelines. Highly optimized and efficient technologies, like Spark, are very important to us.
The smart forecasting team at Walmart is motivated to drive operational efficiency through improvement in the ability to predict customer demand. Our primary client are demand managers who use our forecast to plan inventory. We have built a forecasting platform consisting of a suite of models forecasting for every scenario and a wide range of items ranging from bananas to bottled water to bicycles. Besides, we have also built a UI on top of the platform. There, the demand managers can view the forecast and several key business metrics that are driven by forecast. Our forecast have been [inaudible] the supply chain by providing demand forecast every week for the next 52 weeks for more than a hundred million store-item combinations. The zero to six week horizon forecasts are used for inventory control and the higher horizon forecasts are used for production planning and purchase orders.
Now, moving on to the problem that we are going to discuss today. We’ll begin with motivation for the problem as to why did weekday demand sensing become important to us. Then we will go into the details of the model that was built to solve the problem. Next, we will move on to some key results of the problem of the model. And finally, we will talk about the implementation and scale of the model. So let’s begin with the motivation of the problem.
Our forecasting models are trained at scale every week, and weekly forecasts for all items and for all stores delivered every Monday. Our downstream systems require the forecast to be ready by Saturday, so we start training our models as soon as we receive the Friday sales data. In other words, to deliver forecasts on Monday, we need to have the forecast ready by Saturday. And to generate forecast by Saturday, we train the model on the data available ’till Friday. As you can see, our forecasting models will not have incorporated the weekend data for forecasting for the upcoming week. At this point, we might think that retraining the models with the Saturday and Sunday sales might be an option, but that will turn out to be very expensive and risky to wait for the Saturday, Sunday sales to roll in and then train all the different models on Monday and then deliver the forecast on the same day.
Now to address this, the demand managers used empirical replenishment rules over the weekend sales data to make adjustments to our smart forecasts that are practical for each store. Now with 11,300 stores to account for, you can imagine how tedious the task would be. So here they saw an opportunity to enhance that forecast by sensing the weekday demand using weekend sales. Once again, retraining the models on Monday to include the Saturday, Sunday sales would be expensive. Therefore, we built a new lightweight model denoted as the in-week adjustments model that would run every Monday to enrich the forecast with weekend sales information. Besides leveraging replenishment rules, the in-week adjustments model uses historical sales patterns of an item in a store to predict demand and help estimate forecast adjustment specific to the store item.
Now, let’s look at some more details about the in-week adjustments model. The in-week adjustments model is a simple linear model to predict demand and introduce weekday forecast enhancements. The model adjusts horizon 0 smart forecast for Tuesday to Friday based on Saturday and Sunday seals. The intuition behind this is if a product sells higher than forecasted on the weekend, we could expect the remaining week sales to be higher as well. The results from this algorithmic approach have been readily adopted by our business partners and has consistently delivered business impact over the past year. In addition to boosting the quality of the demand forecast, this algorithm has helped reduce the amount of forecast adjustment required to be made by the busy demand managers and without adding any additional ETL overhead.
So let’s look into the details of the model. To train the model, we first pull the historical forecast for all the target categories. And we also pull the last 52 weeks sales data for all item-store combinations in the target categories. We observed that for our model, the last 52 weeks data was enough to achieve the desirable accuracy. Next, we use the sales data to compute the daily sales percentage of each store-item combination. That is we compute the proportion of sales that happen on Monday, Tuesday, Wednesday, so on, for each of the store-items. This step is really important. So using the daily sales percentages, we are able to filter out special or outlier situations such as, for example, if there were Halloween on a Sunday and we saw a huge spike in the sales of pumpkins, we would like to exclude such examples from our training set. Because if we had high sales for pumpkins on Sundays and Halloween is over by the weekend, we do not want to bump up the rest of the following weekday sales to adjust to that level. So that is why we use robust estimators on the daily sales percentages computed in the step to automatically remove outlier weeks such as national holidays, promotion events, et cetera, from our training set.
Once that’s done, we compare the historical forecast with historical sales to determine store-item week combinations where we were underselling or overselling on weekends compared to the forecast. We filter out items that may not actually need any forecast adjustment based on historical forecast and sales patterns. And with the remaining items, we train a linear model to predict demand as a function of Saturday quantity, last Sunday, and the smart forecast. With this step, we are done training the model.
So with the model trained, we are ready for scoring. We pulled the current week sales data, including Saturday and Sunday sales. We pull some additional information such as on-hand quantity, Saturday stock, what was the quantity received at the stores on the weekend, what are the active promotions at the store. These additional quantities will help us filter out stores that may not need a forecast adjustment to trigger replenishment activity at the store. Then, we score the store-item combinations in the current week using the model. We define forecast adjustment as score prediction minus the forecast. At this point, we are ready with our forecast adjustment for store-item combinations, but not all of these combinations might need adjustment. So we accept the new prediction as the new forecast if the adjustment suggests adjusting the store inventory based on the current on-hand quantity and case pack sizes of the item in the store.
So now let’s look at some key results from this model. To evaluate the model, we performed a comprehensive back-test over a period of 12 weeks for all categories in the produce and grocery departments. We calculated the forecast data metric for forecast with the in-week adjustments and for forecast without the in-week adjustments. We observed that for more than 70% of the categories, there was an improvement in the forecast error. That is the error reduced due to in-week adjustments. In the plot below, we have on the y-axis the basis point change in the error metric. A negative basis point improvement indicates a reduction in the forecast error, and that’s an improvement in forecast accuracy. On the x-axis, each of the bars indicates one category in the produce and grocery department. As you can see, for more than 70% of the categories the in-week adjustment improves the forecast with a few categories showing an improvement as high as 300 basis points. And each basis point improvement translates to several thousands of dollars saved for Walmart. We do have a few categories that did not improve, and these categories comprise items with sharp, seasonal patterns where the [inaudible] ramps up and ramps down very quickly in a span of two to four weeks.
At this point, we could further improve the model but given its benefit for the majority of categories, it’s important for us to expand it to more departments. Further, in-week adjustments has provided consistent error improvement week on week. Here is an example of one such item where the forecast error has consistently improved on all weeks. The plot below shows on y-axis the forecast data and on x-axis the series of weeks and [inaudible] as numbers. The green line shows the error for the forecast without the in-week adjustment and the purple line shows the error with the in-week adjustments. So you can see the forecast with in-week adjustment has lower error than the forecast without, especially in weeks 39 to 45. These results prove that the model will be very beneficial for the business, and hence, we plan to expand the model to more departments and markets. Now, Jay will show us the implementation and scale of the model.
Jay Kakkar: Thank you very much, Divya. Let’s dive into the implementation of the in-week adjustment model and discuss the future of this project. Currently, we are only training the linear model on the produce department and have seen the positive impact that the project had. And so our aim is to try and expand the scope of this project to several other departments and markets. However, the current implementation is not sufficient for this aim for a few reasons.
Firstly, all of the sales data, forecast data, and other input data is stored into the HDFS, Teradata, and NFS drive. This means that we have no centralized source of data and this prevents us from scaling up this project to different departments due to the overhead required to maintain all of these data sources as well as the added complexity that this infrastructure brings in preparing the data for the training phase. Specifically, the tables that are stored on the HDFS and Teradata needs to be pulled into our servers every time we run the model and they cannot be extracted with one query. Moreover, to keep these tables updated, we need to read-in, modify, and then overwrite the entire table which is slow and inefficient, especially when we consider that we want to expand the use of this model to multiple departments.
Additionally, the model runs on a single server paralyzed across 56 cores. With the large department level data we are training and scoring these models on, this will not allow us to scale to multiple departments reasonably. Since we use Saturday and Sunday sales combined with the forecast for the current week as the input to our model, as Divya discussed earlier, in order to get an adjusted forecast to demand managers in time, the model needs to be run on Monday. If we do this for multiple departments and multiple markets, this will not be manageable. And for these reasons we have been working on moving this project to Spark so that we can mitigate these issues and scale efficiently.
There are two principle reasons we decided to refactor this project onto Spark. Firstly, we can leverage cloud storage data. Specifically, we will now have a unified data storage solution in the form of blob storage as well as Hive delta tables, which enables us to keep our data updated and refreshed on a more regular basis far more efficiently and simply than in the current data storage structure we are using. This is because we now have the ability to simply add data without overwriting the entire table it is being added to. And all of our data can be accessed with simple queries where all of the data stays in the cloud and you will not have to pull any of it to a local server.
Secondly, we will be able to take advantage of the fact that Spark is highly optimized to partition the data by [inaudible], allowing the training and scoring runtimes to be drastically reduced and improve efficiency. Spark data frames give us the ability to run far more optimized queries over using [inaudible] data frames on our local servers, above the fact that Spark’s memory management is more optimized for these queries too. Additionally, we will also leverage the use of the parquet format over the CSB format, which will improve the runtime of our models and this allows us to simply append the data to certain partitions of our tables rather than overwriting the whole table every time we edit it. By having a simple and easy to maintain data storage solution, coupled with improved runtime offered by transitioning to Spark, we will now be able to train and score for more departments as well as more markets every Monday.
So to summarize the plan implementation, our data storage solution will be to use blob storage as parquet files partitioned by item. We can optimize this partitioning for the training and scoring stages specifically to improve overall run time. Then, when it comes to the model training and scoring, we can exploit the Spark environment to run highly paralyzed models using PI spot data frames. And finally, the model outputs will be saved to parquet files and then will be viewable by demand managers through our smart forecasting platform in a user-friendly UI.
To conclude, the in-week adjustments project has been hugely successful since this implementation in March of 2019. It has delivered hundreds of basis points improvement week-on-week, and help reduce food waste and improve customer availability. We would not be able to carry out the expansion process without Apache Spark as we now look forward to the future of this project and are excited to try and impact more departments across Walmart with the algorithm that we’ve developed. So thank you very much for having us and thank you for all of your time. Please don’t forget to give us your feedback. Thank you.
Divya Hindupur is a Data scientist on Walmart's Smart Forecasting team. She solved demand forecasting problems on price and event sensitivity of item demand. She graduated from Columbia University wit...
Jay Kakkar is a software engineer on Walmart's Smart Forecasting Team. He comes from UC Berkeley where focused on statistics and machine learning theory and was a teaching assistant.