Skip to main content
Company Blog

The current economic environment is having a significant impact on the Retail and Consumer Goods sector. Rapid changes in how consumers shop is forcing companies to rethink their sales, marketing, and supply chain strategies. Companies can still reduce costs and win market share to drive stronger growth, but this requires new ways of understanding and acting on the consumer. Through the use of big data and AI, retailers and consumer goods companies can refocus their efforts on areas that will rapidly deliver value and drive growth into the future.

For years, the Spark + AI Summit has been the premier meeting place for organizations looking to build AI applications at scale with leading open-source technologies such as Apache SparkTM, Delta Lake and MLflow. In 2020, we’re continuing the tradition by taking the summit entirely virtual. Data scientists and engineers from anywhere in the world will be able to join June 22-26, 2020 to learn and share best practices for delivering the benefits of AI.

This year we have a robust experience for data teams in the Retail and Consumer Packaged Goods Industry. Join thousands of your peers to explore how the latest innovations in data and AI are providing new ways to optimize supply chains and connect more deeply with the modern shopper. Register for Spark + AI Summit and visit the Retail and CPG Lounge to take advantage of all the sessions and events.

Here is an overview of some of our most highly anticipated Retail and CPG talks at this year’s summit:

Starbucks

Keynote: How Starbucks is Achieving its ‘Enterprise Data Mission’ to Enable Data and ML at Scale and Provide World-Class Customer Experiences
A key aspect to ensuring the excellent customer experiences Starbucks is known for is data. Tremendous amounts of data. This keynote highlights how the company makes decisions powered by data at scale, including processing customer data on a petabyte level with governed processes, deploying platforms at the speed-of-business, and enabling ML across the enterprise. Join to learn the ins and outs of building a world-class enterprise data platform to drive world-class CX.

Columbia

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake
Today, Columbia seamlessly integrates data from all line-of-business-systems to manage its wholesale and retail businesses -- but it wasn’t always so. In this presentation, hear how they achieved a 70% reduction in pipeline creation time and reduced ETL workload times from four hours to just minutes when they made the switch from multiple legacy data warehouses to Azure Databricks and Delta Lake, enabling a serious boost in efficiency and near real-time analytics.

Walmart

Building Identity Graphs over Heterogeneous Data
Customers and service providers interact in a variety of modes and channels, making a unified identity view an often complex challenge. Since every interaction or transaction event contains some form of identity, a highly scalable platform is required to identify and link the identities belonging to a single user as a connected component. Walmart solved this problem by building its Identity Graph platform using the Spark processing engine. Join this talk to hear how they were able to create a solution that handles 25+ billion vertices and 30+ billion edges and an incremental 200M new linkages every day.

Mars

Building the Petcare Data Platform using Delta Lake and ‘Kyte’: Our Spark ETL Pipeline
Mars Petcare is focused on building the Petcare Data Platform of the future to support analytics and customer insights for its many petcare brands such as Iams, Whiskas, Pedigree and Nutro. In this talk, learn how Mars leveraged Spark and Databricks to build ‘Kyte’, a bespoke pipeline tool that massively accelerated their ability to ingest, cleanse, and process new data sources. Hear more why they chose a Spark-heavy ETL design and a Delta Lake-driven platform and why they’re committing to Spark and Delta Lake as the core of their Platform to support their mission: Making a Better World for Pets!

iFood

Building a Real-Time Feature Store at iFood
iFood is the largest food-tech company in Latin America. In order to maintain their top-tier status, they’ve built several machine learning models to provide accurate answers for questions such as: how long it will take for an order to be completed?; what are the best restaurants and dishes to recommend to a consumer?; is the form of payment fraudulent?; among others. To generate the training datasets for those models, and to serve features in real time so predictions can be made correctly, it’s necessary to create efficient, distributed data processing pipelines. In this talk, learn how iFood uses Databricks and Spark Structured Streaming to process events streams, store them in a historical Delta Lake Table and a Redis low-latency access cluster, and how they structure their development processes.

You can see the full list of talks on our Retail and Consumer Packaged Goods summit page.

Retail and Consumer Goods Forum

Join us on Thursday, June 25 at 11:30am-1:00pm PST for an interactive Retail and CPG Forum at Spark + AI Summit. In this free virtual event, you’ll have the opportunity to network with your peers and participate in engaging panel discussions with industry leaders on how data and machine learning are driving innovation across the entire retail value chain. Panelists include:

Robert Ruch
SVP, CIO

Brad Kent
VP, Analytics & Insights

Saritha Ivaturi
Director of Data Systems

Demos on Popular Data + AI Use Cases in Retail and CPG

Join us at Summit for live demos on the hottest use cases in the retail and consumer goods industry:

Demand Forecast
The computational limitations that forced companies to compromise their demand forecasting are a thing of the past. In this demonstration, we'll show you how to take advantage of the elastically-scalable patterns used by many companies to generate timely forecasts at levels of granularity that were out of reach in years past.

Safety Stock Analysis
A key application of demand forecasts is the calculation of the buffer inventory (aka the safety stock) required to ensure customer demand for goods is immediately met.  This aspect of inventory management has become even more critical as traditional businesses pivot to curb-side fulfilment and at home delivery in the wake of the COVID crisis with customer loyalty shifting to those retailers best able to deliver on the promises made through their online applications. In this demonstration, we'll examine how a common substitution made in the safety stock calculation puts demand fulfillment at risk.

Customer Lifetime Value
Maintaining a healthy, profitable relationship with customers requires an understanding of their individual revenue potentials. Customer Lifetime Value (CLV) is a popular metric for capturing this potential but in non-subscription retail models, determining probable future spend and variable retention rates is highly difficult.  In this demonstration, we will introduce the BTYD models as a means to overcome these challenges and provide reliable CLV estimates.

Customer Segmentation

Wrangling, analyzing, and systematically modeling time-series data for forecasting requires a unique set of techniques due to the temporal dependence nature of time series. In this training, Walmart will discuss some of the most fundamental concepts and techniques to build and deploy time-series forecasting. Join to learn the key characteristics of time-series data, statistics for summarizing time series, graphical techniques to describe the characteristics of time series, and the essential concepts and techniques required to appropriately apply the autoregressive-type and Neural Network models in practice.

Practical Problem-solving in Retail: Real-time Data Analytics with Apache Spark
Databricks Training

In this half-day course, you will learn how Databricks and Spark can help solve real-world problems you face when working with retail data. You’ll learn how to deal with dirty data, and get started with Structured Streaming and real-time analytics. Students will also receive a longer take-home capstone exercise as bonus content to the class where they can apply all the concepts presented. This class is taught concurrently in Python and Scala.

Sign-up for the Retail and Consumer Goods Experience at Summit!

To take advantage of the full Retail and Consumer Packaged Goods Experience at Spark + AI Summit, simply register for our free virtual conference and select Retail and Consumer Packaged Goods Forum during the registration process. If you’re already registered for the conference, log into your registration account, edit “Additional Events” and check the forum you would like to attend.