Skip to main content
CUSTOMER STORY

Creating safer roadways for U.S. drivers

Texas A&M Transportation Institute unlocks car sensor data on the Databricks Data Intelligence Platform

100s

Of terabytes of spatial data unified for better roadways

Faster

Time-to-insights to support optimized transportation safety

INDUSTRY: Public Sector
CLOUD: Azure

“The Databricks Data Intelligence Platform helps us easily ingest and collaborate on large datasets with trillions of GPS points. Now that we’re effectively visualizing and analyzing connected vehicle and geospatial data, we can continue innovating on transportation safety without limitations.”

— Michael Martin, Associate Research Scientist, Texas A&M Transportation Institute

Improving transportation requires innovation and practicality to continually reinforce safety despite external factors like vehicle performance, weather and traffic. Over 400 researchers at the Texas A&M Transportation Institute (TTI) have been fueling transportation optimization for 70 years across the U.S. and within 53 countries. Serving transportation agencies with in-depth, targeted research on everything from pavement and travel averages to crash analytics and speed compliance, TTI generates the data and analytics required for practical and implementable products and strategies within agencies. As IoT data has grown with the rapid growth in the number of connected vehicles, TTI’s legacy data infrastructure struggled to keep up with the necessary compute resources, data security and collaboration at scale to make informed decisions. TTI centralized on the Databricks Data Intelligence Platform to simplify and accelerate data processing within a consolidated cloud environment that expands capabilities with advanced analytics tooling. Now leveraging terabytes of raw IoT and geospatial data with organization and speed, TTI efficiently produces targeted studies, planning models, inventories and surveys to support roadway safety with precision.

Legacy data stack blocks the path to safer roadways

It’s essential that research organizations empower data science teams to maximize all available information to hone the development of reliable outputs for clients and internal needs. The Texas A&M Transportation Institute understands that the data, analytics, recommendations and strategies they deliver to transportation agencies across the U.S. are dependent on their ability to operationalize the raw data they ingest. Today, connected vehicle information — generated by IoT sensors built into vehicles, traffic cameras and more — yields trillions of data points that must be combined with external data sources such as roadway infrastructure and weather information to gain actionable insights for departments of transportation, metropolitan planning organizations and the Federal Highway Administration (FHWA). Unfortunately, TTI was limited to PCs and ad hoc solutions due to legacy software that failed to scale to efficiently analyze the larger sets of geospatial and IoT data captured daily.

Michael Martin, Associate Research Scientist at TTI, describes, “We ingest enormous amounts of streaming data. For example, every three seconds, a GPS point is transmitted from millions of vehicles along with the attributes assigned to each point — including driving direction, speed, timestamps, latitude, longitude, etc. Capturing 19 months of data can easily surpass 150 terabytes or 1.2 trillion GPS points.” Before Databricks, TTI cobbled together various discounted systems, creating data silos within the individual language preferences used by each practitioner and team. The scattered environment added additional complexities around access, lineage and data tracking, while data teams were slowed by inefficient tools for the type of data TTI works with. Additionally, the precise visualizations clients wanted were difficult to create in their desired formats because of the rigidity of the tools.

“We really maxed out the geospatial capabilities of our software, and we couldn’t gain efficiency within our existing infrastructure. Clients need a detailed picture of what’s happening on roadways to understand each factor’s statistical relationship with risk. That was hard to do with so many low-performing tools, and our hands were tied,” explains Martin. Stalled without scalability and operational efficiency, TTI decided to modernize their environment to meet the next wave of data innovation in transportation.

Creating safer driving conditions with Databricks

Critical to TTI’s transformation was selecting a flexible solution with easy-to-use, advanced tooling that could unify large datasets while driving data team productivity. It was essential to remove the barriers preventing data teams from efficiently integrating IoT data from connected vehicles with external data sources like roadway crashes. After discovering the Databricks Data Intelligence Platform, TTI found the flexibility and functionality required to integrate and unify data across the organization. TTI migrated to the Databricks Platform, establishing a foundation for data science that enables the organization to grow alongside the evolution of data.

TTI replaced their legacy software with the Databricks Data Intelligence Platform on Microsoft Azure to take advantage of the cloud within a central location where diverse groups, using individual programming language preferences, can work collaboratively and concurrently. Martin describes how TTI is developing more sophisticated governance and security practices to foster the new collaboration capabilities while protecting sensitive data, “With Unity Catalog we can provide organization without being too controlling or burdensome, so we’re facilitating data access, lineage and data tracking to gain an understanding of where data comes from, what’s been done with it and how that connects to the outcomes we’re developing.” This allows TTI to layer different types of internal and external data for deeper analytics in support of specific client use cases.

Armed with large-scale geospatial processing on the lakehouse architecture, TTI has been able to ingest and store the trillions of GPS movement points that contribute to safety analysis for transportation optimization. With Delta Lake, TTI has accelerated code development in a more streamlined environment and can more easily visualize geospatial data with flexibility based on client preferences using seamless integrations with Power BI and Tableau. To build and train TTI’s geospatial models with ease and simplicity, they use Databricks H3 grid indexing capabilities. Martin explains, “It’s a flexible concept that lets you limit the amount of data you’re initially working with. So, we can build our processes of linear referencing between points and quickly sub-set to the location of interest. It’s reduced the time and expense we previously spent processing and helped distribute data to nontechnical teams.”

Eliminating many of the operational disruptions commonly holding organizations back, TTI has reached new levels of productivity, pace and efficiency to support a range of use cases, including predictive crash analytics, parking and speed zone studies, telematics data to update planning models, traffic volume estimates and statewide weather radar data for measuring precipitation.

Safer roads, fewer accidents with Databricks

With the Databricks Data Intelligence Platform, TTI created a modern data infrastructure that supports the data needs of the organization now and into the future. Key to this success are the process improvements embedded in the Databricks Platform that enable scalability, functionality and speed. Since migrating, TTI can take on trillions of spatial data points available through connected vehicles and smoothly integrate them to derive specificity from results.

High-performance compute within an innately complex environment is fueling TTI’s data capacity with fewer steps, tasks and manual interventions. Martin says, “It’s really eye-opening to see how much access we have to our cloud architecture. We don’t have to make requests and create bottlenecks when creating clusters. We can do it ourselves and that efficiency trickles down to support other projects and users.” And TTI is just getting started.

To further reduce dependency on data teams, TTI is using Unity Catalog to create a secure, updated, internal data library that can be sourced and accessed by other teams. As TTI continues building on ML and AI capabilities, they’re performing feature extraction from images to close data gaps with supplemental data. While there’s a lot of data out there, TTI is ensuring they have reliable data to build valid insights and strategies for location- or project-specific reporting.

Going forward, TTI is confident in their roadmap for the future, equipped with the tools, scale and efficiency provided by the Databricks Data Intelligence Platform. Martin expands on their vision of the future, “It’s about broadening the capabilities Databricks has afforded us to democratize data across the organization. The Databricks Platform is so easy to work with, and we know the more people we get using it, the more we’re going to be able to grow and innovate.”