GIS Pipeline Acceleration with Apache Sedona
- Data Engineering
- Moscone South | Level 3 | 314
- 35 min
Performing geospatial processing with commonly used tools like geopandas can get slow as the data gets larger. In this talk, we will talk about doing large scale geospatial processing in databricks using Apache Sedona. Apache Sedona is an open-source package that extends Apache Spark to work with GIS artefacts such as polygons and introduces common GIS functions such as intersect and overlay.
We've been using Apache Sedona to process 25 billion records daily in CKDelta since 1.0.0 has been released last year and experienced significant performance boosts. We'll share out experience and benefit of using it and present our solutions for setting up the Apache Sedona on databricks, common pitfalls, solving issues and implementing the GIS data pipeline on databricks.