Highways and Hexagons: Processing Large Geospatial Datasets With H3
Overview
Tuesday
June 10
4:10 pm
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Public Sector |
Technologies | Apache Spark, Databricks SQL, Databricks Workflows |
Skill Level | Intermediate |
Duration | 40 min |
The problem of matching GPS locations to roads and local government areas (LGAs) involves handling large datasets and a number of geospatial operations. In this deep dive, we will outline the challenges of developing scalable solutions for these tasks.
We will discuss our multi-step approach, first focusing on the use of H3 indexing to isolate matches with single candidates, then explaining use of different geospatial computational techniques to accurately match points with multiple candidates.
From technical perspective, the talk will showcase the use of broadcasting and partitioning techniques, their effect on autoscaling, memory usage and effective data parallelization.
This session is for anyone interested in geospatial data, spark performance optimization and the real-world challenges of large-scale data engineering.
Session Speakers
Petr Andreev
/Senior Data Engineer
Mantel Group
Olivia Ren
/Solution Architect
Databricks