Apache Spark 2.1.0 boosted the performance of Apache Spark SQL due to Project Tungsten software improvements. Another 16x times faster has been achieved by using Oracle’s innovations for Apache Spark SQL. This 16x improvement is made possible by using Oracle’s Software in Silicon accelerator offload technologies. Apache Spark SQL In-memory performance is becoming more important due to many factors. Users are now performing more advanced SQL processing on multi-terabyte workloads. In addition on-prem and cloud servers are getting larger physical memory to enable storing these huge workloads be stored in memory. In this talk we will look at using Spark SQL in feature creation, feature generation within pipelines for Spark ML.
This presentation will explore workloads at scale and with complex interactions. We also provide best practices and tuning suggestion to support these kinds of workloads on real applications in cloud deployments. In addition ideas for next generation Tungsten project will also be discussed.
Brad Carlile is Senior Director of Strategic Applications Engineering at Oracle. His engineering team explores the performance of x86 and SPARC servers on database, analytics, and application workloads. Additionally, his group conducts detailed performance analysis on Oracle systems and competitive systems. Previous to Oracle he worked at Sun where he was responsible for benchmarking and performance innovations. Previous to Sun he worked on performance at Cray Research and Floating Point Systems. He holds a bachelor's degree in engineering from Northwestern University and is the author of over two dozen technical papers in high performance commercial and scientific computing.