We recently hosted a live webinar — Geospatial Analytics and AI in Public Sector — during which we covered top geospatial analysis use cases in the Public Sector along with live demos showcasing how to build scalable analytics and machine learning pipelines on geospatial data at sale.
Geospatial Analytics Webinar Overview
Today, government agencies have access to massive volumes of geospatial information that can be analyzed to deliver on a broad range of decision-making and predictive analytics use cases from transportation planning to disaster recovery and population health management.
While many agencies have invested in geographic information systems that produce volumes of geospatial data, few have the proper technology and technical expertise to prepare these large, complex datasets for analytics — inhibiting their ability to build AI applications.
In this webinar, we reviewed:
- Top geospatial big data use cases in Public Sector spanning public safety, defense, infrastructure management, health services, fraud prevention and more
- Challenges analyzing large volumes of geospatial data with legacy architectures
- How Databricks and open-source tools can be used to overcome these challenges in the cloud
- Technical demos and notebooks shared on the webinar:
- Object Detection in xView Imagery: Bridges complex object detection using Deep Learning with accessible SQL-based analytics for non-data scientist personas. Download related notebooks: data engineering and analysis.
- Processing Large-Scale NYC Taxi Pickup / Dropoff Vectors: Optimizes geospatial predicate operations and joins to associate raw pick-up/drop-off coordinates with their corresponding NYC neighborhood boundaries to facilitate spatial analysis. Download related notebook.
At the end of the webinar we held a Q&A. Below are the questions and answers:
Q: We deal with large volumes of streaming geospatial data. How would you recommend handling these real-time data streams for downstream analytics?
A: This can be broken down to (1) handling large volumes of streaming data and (2) performing downstream geospatial analytics. Databricks makes processing and storing large volumes of streaming data simple, reliable, and performant. Please reference Delta Lake on Databricks and Introduction to Delta Lake for some additional material. The second part builds on storage and schema decisions made during the processing phase. Spatial analysis is fundamentally addressed through the use of Spark SQL, DataFrames, and Datasets to power transformations and actions over data originating from various formats and schemas. Databricks offers various runtimes such as Machine Learning Runtime and Databricks Runtime with Conda which pre-bundle popular libraries including Tensorflow, Horovod, PyTorch, Scikit-Learn, and Anaconda for both CPU and GPU clusters to facilitate common Data Engineering and Data Science needs. Customers can also manage their own Libraries or Containers to customize the environment for any analytic, to include spatial specific needs. Please reference popular spatial frameworks listed in the following question as well as the FINRA Customer Case Study
Q: What are some of the more popular spatial frameworks being used in the public sector?
A: Popular frameworks which extend Apache Spark for geospatial analytics include GeoMesa, GeoTrellis, Rasterframes, and GeoSpark. In addition, Databricks makes it easy to use single-node libraries such as GeoPandas, Shapely, Geospatial Data Abstraction Library (GDAL), and Java Topology Service (JTS). By wrapping function calls in user-defined functions (UDFs) these libraries can further be leveraged in a distributed context as well. UDFs offer a simple approach for scaling existing workloads with minimal code changes.
Q: Where is my data stored and how does Databricks help ensure data security?
A: Your data is stored in your own cloud data lake, such as in AWS S3 or Azure Blob Storage. However, data lakes often have data quality issues, due to a lack of control over ingested data. Delta Lake adds a storage layer to data lakes to manage data quality, ensuring data lakes contain only high-quality data for consumers. Delta Lake also offers capabilities like ACID transactions to ensure data integrity with serializability as well as audit history, allowing you to maintain log records details about every change made to data, providing a full history of changes, for compliance, audit, and reproduction. Additionally, Delta Lake has been designed to address various right-to-erasure initiatives such as the General Data Protection Regulation (GDPR) and recently the California Consumer Privacy Act (CCPA), reference Make Your Data Lake CCPA Compliant with a Unified Approach to Data and Analytics. As part of our Enterprise Cloud Service, Delta Lake is tightly integrated with other Databricks Enterprise Security features.
Additional Geospatial Analytics Resources
- Sign-up for a free trial and download these notebooks to start experimenting:
- Data Engineering: Object Detection with xView
- Analysis: Object Detection with xView
- Analyzing NYC Taxis with GeoMesa
- Read our recent blog Processing Geospatial Data at Scale With Databricks to learn how Databricks Unified Data Analytics Platform addresses challenges around ingesting, storing, and analyzing spatial data of massive size
- Download our Guide to Data Analytics and AI at Scale for the Public Sector
- Visit our Public Sector page to learn how the Center for Medicare & Medicaid Services, DHS and other agencies are innovating with Databricks