Skip to main content

Empowering industries with geospatial insights


Faster spatial querying, indexing and partitioning


Reduction in peak memory consumption as compared to other platforms

safegraph hh header image color

"As a data company, giving our customers access to our data sets is critical. The Databricks Data Intelligence Platform with Delta Sharing really streamlines that process, allowing us to securely reach a much broader user base regardless of cloud or platform."

— Felix Cheung, VP of Engineering, SafeGraph

SafeGraph serves businesses the data they need to make key decisions about everything from marketing to operations by delivering high-precision geospatial data sets that illustrate where people spend their time and money. Working with petabytes of historical data, SafeGraph’s lean team of engineers was challenged with massive volumes of data and incomplete data sets. The Databricks Data Intelligence Platform enables SafeGraph to optimize workflows and facilitate the exchange of hard-to-attain geospatial data sets and insights that help organizations across industries succeed.

Scaling geospatial data in parallel with real-world behavior

SafeGraph is a geospatial data company focused on aggregating complex data — points of interest (POI), foot traffic, as well as other non-personal identifiable information (PII) — to surface insights about different regions, buildings, communities and the people connected to them, in order to help customers across industries, including retail, financial services, real estate and government, figure out where and how to drive better business outcomes.

SafeGraph processes petabytes of geospatial data acquired through historical data sets. With a small data team, processing high volumes of information while still accurately analyzing the nuances of each data set being ingested was time-consuming and resource-intensive.

“Where is the data coming from, what are the different sources, what are the differences between them? That’s what we need to know to address mismatches and coding issues,” said Felix Cheung, VP of Engineering at SafeGraph. “The data sets are immense and always getting bigger, and while we have expertise on particular parts of the system, it’s essential that we all work together on building an accurate final data product — that’s what Databricks allows us to do.”

Streamlining and democratizing the geospatial data ecosystem

The team at SafeGraph found that cloud storage was not an ideal place to write or manage massive data sets, so they turned to the Databricks Data Intelligence Platform. Delta Lake, an open format storage layer that brings reliability, security and performance to the data lake, serves as the foundation of SafeGraph’s analytics system — allowing them to unify all their data sets and build scalable data pipelines to feed analytics and machine learning models.

As a data company, SafeGraph needs to share their data sets with customers and partners alike. They were an early adopter of Delta Sharing, an open protocol for the secure real-time exchange of large data sets, and are now able to securely exchange data with dozens of destinations without the complexities of having to deploy a specific platform first. This reduces access time from months to minutes and greatly reduces work for data providers who want to reach as many users as possible.

With data feeding their predictive models to help businesses make smarter decisions, they use MLflow for model deployment, monitoring and performance tracking.

“Machine learning matters here,” said Cheung. “It’s an essential part of our architecture, enabling us to maintain the quality of our data even as it balloons. Additionally, our customers need to be able to access the data itself, so now we’re a partner with the Delta Sharing Initiative. We hope to reach a much broader user base, regardless of where our customers are, and that kind of collaboration and integration is something we’re keenly interested in, both internally and externally.”

Along with Databricks, SafeGraph employs AWS Redshift to warehouse their data, EKS Kubernetes for containerized execution, and Elasticsearch for fuzzy queries to help with misspellings and other data quality problems.

“As more data is ingested,” says Cheung, “we’re able to identify more unique links between data points, easily push everything to the customer cloud, and continuously improve the accuracy and depth of the ecosystem.”

Harnessing the combined power of data-driven decisions

With the ability to group data streams, apply hierarchical prioritization to data sets and optimize query patterns, SafeGraph can now quickly provide the detailed information their customers need to make proactive, strategic decisions about their businesses. With deeper collaboration and insights around current challenges, they can address what’s standing in the way today while laying the groundwork for future growth and value.

“With Databricks, the end-to-end optimization we’ve seen to our geospatial data system and its accuracy has been continuous,” said Cheung. “The benefits to our customers are clear, and SafeGraph is now in the ideal position to help businesses everywhere leverage geospatial data to keep pace with real-world demand and change.”