Skip to main content
Engineering blog

Building Forward-Looking Intelligence With External Data

Tian Tan
Javier Soliz
Bryan Smith
Rob Saker
Share this post

This post was written in collaboration with the Foursquare data team. We thank co-author Javier Soliz, sales engineer specializing in data engineering and geospatial analysis at Foursquare, for his contribution.

 
"In an interlocked global economy, triggering events can quickly set off a chain reaction," wrote Boston Consulting Group in early 2020 as the world grappled with the COVID pandemic. Already in the first few months of 2021, we have experienced wildfires in western Australia, winter storms causing millions to lose power for days in Texas, a powerful earthquake off the coast of Japan, flooding and evacuations in both eastern Australia and Hawaii, political unrest surrounding the U.S. presidential election and a single ship shutting down a major global shipping route between Europe and Asia – all while the world struggles to recover from a global recession triggered by the pandemic. With no shortage of triggering events, organizations are now investing heavily in resilience.

A common notion of resilience is a return to normalcy following a disruptive event. But as the COVID pandemic illustrates, what was normal before may not be normal after. We've seen  a remarkable shift in patterns of consumer mobility and spending. Once the initial panic over shortages of staples, such as toilet paper subsided, oat milk and sweatpants became the new must-have items. Businesses that  could fulfill this demand through online purchasing, home delivery and curbside pickup saw significant growth, while others saw their share of the market decline. Emerging from the pandemic, even more shifts in consumer spending patterns are expected.

The bottom line for businesses is that the uncertainty that affects their internal operations also affects the consumers they serve. Organizations seeking resilience need not only an internal focus on performance management but an external focus on the markets within which they operate.

Building forward-looking intelligence

The Texas-based grocery chain, HEB, provides an excellent example of how organizations may balance an inward focus on performance management with an outward focus on risk detection. Leveraging methodologies that examine potential future scenarios to understand an organization's particular vulnerabilities, HEB was able to identify key risks to its organization well ahead of the pandemic. As the COVID crisis emerged, the grocer knew to be on the lookout for potential disruptions in regions critical to its supply chain and began the process of stocking up on essential items likely to be affected.

While a pandemic was not a specific threat identified by HEB, its assessment of its organization's vulnerabilities informed it where to look for emerging threats. The signals needed to identify those threats would not be found in its internal data until the threat was already upon the organization, so it looked to outside information sources to provide it the early warning it needed to put its planned response in motion. HEB's ability to successfully navigate the early days of the COVID pandemic is multifaceted, but looking outside the organization for forward-looking signals was a key part of it. For its early, effective and on-going efforts in managing the pandemic, HEB was recognized as the 2020 Grocer of the Year by GroceryDive, a leading trade journal.

Leveraging external data

The growing awareness of the need for organizations to look beyond their own four-walls is driving a surge in interest in external data sources. A recent survey by Forrester indicates 70% of organizations acquired or were in the process of acquiring new external data assets and another 17% reporting intending to do so within the coming year. In response, there are a growing number of data providers, aggregators and marketplaces making all types of information, such as weather data, more accessible. (See also alternative data.)

Commonly used external data from a report by McKinsey & Company

Figure 1. Commonly used external data from a report by McKinsey & Company

Effective use of such information requires careful consideration. Here are a few best practices:

Before acquiring external data, carefully consider the insights your organization wishes to obtain from it. A careful review of the terms and conditions associated with the data, as well as a consideration of how the data is sourced and how customers might respond  to your company using it, should help you steer clear of potential problems.

If cleared for use, it is important to understand how the data  is collected and prepared for distribution, how far back the data is available, and how fit it is for your organization's intended uses. Many data providers make both documentation and samples available for just this purpose.

Weigh the technical challenges of leveraging the external data sources. The volume of historical data and periodic updates, the frequency with which it is updated and the mechanisms by which data is made available are key considerations. Also determine how data assembled outside the organization may be reconciled with internally generated data. Differences in temporal and spatial levels of granularity, as well as different ways of expressing overlapping dimensions, may require the data to undergo significant processing to be made available for analysis. For many organizations, the physical and logical challenges of integrating external data necessitate the adoption of new, more flexible and more cost-effective data management approaches over classic data warehousing approaches developed for the analysis of operational information.

Ensure value is derived from the data on an ongoing basis. Careful documentation, education and evangelism, and ongoing utilization monitoring can help ensure the data earns its keep. Many larger data providers assist their customers with this and may be able to provide guidance and best practices. These suggestions and many others for the effective use of external data can be found in published guidance from both McKinsey and Forrester.

Examining foot traffic with Foursquare data

To further explore how external data may be employed, we partnered with Foursquare, a leading provider of location technology and data, to examine the impact of COVID on taco shops in the US.

Why taco shops? Like most quick service restaurants, these establishments are highly dependent on foot traffic, a key aspect of consumer engagement disrupted during the pandemic. These establishments also tend to be smaller, independent businesses and as such, as has been noted in some regional reporting, are more capable of adapting their business models in response to the pandemic.. Finally, while this analysis can be applied to any number of businesses represented in the Foursquare dataset, two of our authors are from Texas, where tacos are a much loved regional staple.

With foot traffic data collected through Foursquare's Pilgrim SDK and made available through its Places and Visits databases, we examined the visitation rates of customers to taco shops in various regions of the country. Leveraging population estimates from the US Census Bureau, we were able to see a clear picture of the regional importance of these establishments.

Visits to taquerias relative to population size, logarithmically scaled, for the years 2017 through 2020

Figure 2. Visits to taquerias relative to population size, logarithmically scaled, for the years 2017 through 2020

To align the point locations of individual businesses with the county-level metrics provided by the US Census Bureau, we leveraged the Uber H3 grid system, which maps geographic locations to hexagonal grids of varying resolutions. This system made it easier for us to overlay additional datasets, such as county-level COVID case counts.

Our analysis shows that while the number of taco shops has been increasing over the last few years, customer visits per restaurant had declined prior to the COVID pandemic. While the vast majority of restaurants are independent, the bulk of the traffic to taco shops was consumed by chain establishments.

Per location customer visits for independent vs. chain taquerias

Figure 4. Per location customer visits for independent vs. chain taquerias

With the emergence of COVID in early 2020, a strong initial dip in visitations led to a return of customers to stores in May at about 75% the levels seen across prior years.

Impact of COVID on store visitations

Figure 5. Impact of COVID on store visitations

Examining year-over-year numbers, the independent restaurants appear to have recovered better than chains following this initial dip. As reported in other venues, the agility of smaller, independent establishments may account for some of their better rebound. Shop local efforts may also have contributed to the pattern with customers favoring neighborhood establishments over larger chains. But independent restaurants have also seen better year-over-year visitation numbers relative to chains just prior to the pandemic, indicating that forces favoring them in 2020 predate the pandemic.

Year-over-year changes in shop visits for independent vs. chain restaurants

Figure 6. Year-over-year changes in shop visits for independent vs. chain restaurants

This is a positive bit of news for these small businesses which have been losing ground to chain restaurants. Looking ahead, we forecast continued overall improvements in visitation numbers which should provide good news for independents and chains alike. That said, these projections depend on reliable forecasts of COVID numbers, something that has alluded public health experts to date. In our analysis we made what we felt was a reasonable projection for a limited period of time, but in the end we found that forecasts were only reliable for a 2-3 month horizon.  All of this is to say that there are still many unknowns, and while we are hopeful for a recovery, this is a scenario that will need to be frequently revisited as new information is available. Based on our experience with other QSRs and retailers, we believe this same caveat applies broadly across the industry.

Historical and forecasted store visits for subset of regions for which forecasts could be made

Figure 7. Historical and forecasted store visits for subset of regions for which forecasts could be made

To examine our analysis in more detail, including the data preparation work required to spatially align our datasets, please explore the following notebooks:

Databricks and Foursquare would like to extend our best wishes to all the local restaurateurs and their employees who have and continue to navigate the uncertainty of the pandemic. Please remember to support your local restaurants.