GIS Pipeline Acceleration with Apache Sedona
On Demand
Type
- Session
Format
- In-Person
Track
- Ingegneria dei dati
Difficulty
- Intermediate
Room
- Moscone South | Level 3 | 314
Duration
- 35 min
Panoramica
Performing geospatial processing with commonly used tools like geopandas can get slow as the data gets larger. In this talk, we will talk about doing large scale geospatial processing in databricks using Apache Sedona. Apache Sedona is an open-source package that extends Apache Spark to work with GIS artefacts such as polygons and introduces common GIS functions such as intersect and overlay.
We've been using Apache Sedona to process 25 billion records daily in CKDelta since 1.0.0 has been released last year and experienced significant performance boosts. We'll share out experience and benefit of using it and present our solutions for setting up the Apache Sedona on databricks, common pitfalls, solving issues and implementing the GIS data pipeline on databricks.
We've been using Apache Sedona to process 25 billion records daily in CKDelta since 1.0.0 has been released last year and experienced significant performance boosts. We'll share out experience and benefit of using it and present our solutions for setting up the Apache Sedona on databricks, common pitfalls, solving issues and implementing the GIS data pipeline on databricks.
Rivivi i momenti migliori del Data+AI Summit
Watch on demand