HomepageData + AI Summit 2022 Logo
Watch on demand

GIS Pipeline Acceleration with Apache Sedona

On Demand

Type

  • Session

Format

  • In-Person

Track

  • Ingegneria dei dati

Difficulty

  • Intermediate

Room

  • Moscone South | Level 3 | 314

Duration

  • 35 min
Download session slides

Overview

Performing geospatial processing with commonly used tools like geopandas can get slow as the data gets larger. In this talk, we will talk about doing large scale geospatial processing in databricks using Apache Sedona. Apache Sedona is an open-source package that extends Apache Spark to work with GIS artefacts such as polygons and introduces common GIS functions such as intersect and overlay.
We've been using Apache Sedona to process 25 billion records daily in CKDelta since 1.0.0 has been released last year and experienced significant performance boosts. We'll share out experience and benefit of using it and present our solutions for setting up the Apache Sedona on databricks, common pitfalls, solving issues and implementing the GIS data pipeline on databricks.

Session Speakers

Fernando Ayuso Palacios

Director of Data Science and Data Engineering

CKDelta (Hutchison Group)

Alihan Zihna

Senior Data Scientist

CKDelta

Rivivi i momenti migliori del Data+AI Summit

Watch on demand