Chris is a Staff Software Engineer within John Deere’s Intelligent Solutions Group. He designs and develops large-scale data pipelines and production-grade machine learning tools and infrastructure, and collaborates closely with John Deere’s data scientists to deliver machine learning models to customers across the globe. He specializes in large-scale data engineering projects and backend cloud-based software systems, with a passion for applications in machine learning. Before joining John Deere, Chris accumulated a breadth of past software engineering experience developing microservices, web applications, native mobile applications, and control systems software in both the Agriculture and Financial sectors.
May 27, 2021 04:25 PM PT
John Deere ingests petabytes of precision agriculture data every year from its customers' farms across the globe. In order to scale our data science efforts globally, our data scientists need to perform geospatial analysis on our data lake in an efficient and scalable manner. In this talk, we will describe some of the methods our data engineering team developed for efficient geospatial queries including:
- Leveraging Quadtree spatial indexing to partition our Delta Lake tables
- Extending the Spark Catalyst Optimizer to perform efficient geospatial joins in our data lake