Performance troubleshooting of distributed data processing systems is a complex task. Apache Spark comes to rescue with a large set of metrics and instrumentation that you can use to understand and improve the performance of your Spark-based applications. You will learn about the available metric-based instrumentation in Apache Spark: executor task metrics and the Dropwizard-based metrics system. The talk will cover how Hadoop and Spark service at CERN is using Apache Spark metrics for troubleshooting performance and measuring production workloads. Notably, the talk will cover how to deploy a performance dashboard for Spark workloads and will cover the use of sparkMeasure, a tool based on the Spark Listener interface. The speaker will discuss the lessons learned so far and what improvements you can expect in this area in Apache Spark 3.0.
Luca is a data engineer at CERN with the Hadoop, Spark, streaming, and database services. Luca has 20+ years of experience with designing, deploying, and supporting enterprise-level database and data services with a special interest in methods and tools for performance troubleshooting. Luca is active in developing and supporting platforms for data analytics and ML for the CERN community, including the LHC experiments, the accelerator sector, and CERN IT. He enjoys sharing experience and knowledge with data communities in science and industry at large.