Spark SQL enables Spark to perform efficient and fault-tolerant relational query processing with analytics database technologies. The relational queries are compiled to the executable physical plans consisting of transformations and actions on RDDs with the generated Java code. The code is compiled to Java bytecode, executed at runtime by JVM and optimized by JIT to native machine code at runtime. This talk will take a deep dive into Spark SQL execution engine. The talk includes pipelined execution, whole-stage code generation, UDF execution, memory management, vectorized readers, lineage based RDD transformation and action.
Maryann Xue is a software engineer at Databricks, committer and PMC member of Apache Calcite and Apache Phoenix. Previously, she worked on a number of big data and compiler projects at Intel.
Kris Mok is a software engineer at Databricks. He works on various components of Spark SQL, with interest on optimizer and code generation. Previously, he worked on JVM implementations, including OpenJDK HotSpot VM at Alibaba and Oracle and Zing VM at Azul, and had broad interest in programming language design and implementation.
Xingbo Jiang is a software engineer at Databricks, where he investigates the use cases on Spark Core and Spark SQL. Xingbo is an active contributor to Apache Spark. His areas of interest include distributed system, database, and data warehouse.