Using Databricks Geospatial Processing at Scale
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Transportation |
| Technologies | Databricks SQL |
| Skill Level | Intermediate |
Our team started on Geospatial journey with Databricks DBR 16.X, from legacy project migration and continued to expand by ingesting and processing three different geospatial datasets (HD USHR maps, public OSM and curated OSM (Overture)). We applied different optimization techniques to make spatial joins more performant and worked closely with Databricks Geospatial team in testing out incremental improvements in DBR, culminating in large scale performance increase on DBR 17.X bringing support to geospatial data types (GEOMETRY, GEOGRAPHY) as dedicated data types that could be stored in delta table. A significant optimization was observed in spatial joins, which could eliminate previously used optimization techniques, making data pipelines maintenance easier. All of these improvements enabled us to enrich internal datasets based on various road attributes from these datasets and use them in analytics, training and potential reinforcement learning applications where needed.
Session Speakers
Chinmay Gupte
/Lead Software Engineer, Data
Rivian
Filip Ilic
/Senior Data Engineer, Autonomy
Rivian