Piotr Balcer is a software engineer with many years’ of experience working on storage related technologies at Intel Corporation. He received B.Eng. from the Gdansk University of Technology in 2014 where he studied system software engineering. For fours years now he has been working on software ecosystem for next-gen non-volatile memory.
April 23, 2019 05:00 PM PT
The capacity of data grows rapidly in big data area, more and more memory are consumed either in the computation or holding the intermediate data for analytic jobs. For those memory intensive workloads, end-point users have to scale out the computation cluster or extend memory with storage like HDD or SSD to meet the requirement of computing tasks. For scaling out the cluster, the extra cost from cluster management, operation and maintenance will increase the total cost if the extra CPU resources are not fully utilized. To address the shortcoming above, Intel Optane DC persistent memory (Optane DCPM) breaks the traditional memory/storage hierarchy and scale up the computing server with higher capacity persistent memory.
Also it brings higher bandwidth & lower latency than storage like SSD or HDD. And Apache Spark is widely used in the analytics like SQL and Machine Learning on the cloud environment. For cloud environment, low performance of remote data access is typical a stop gap for users especially for some I/O intensive queries. For the ML workload, it's an iterative model which I/O bandwidth is the key to the end-2-end performance. In this talk, we will introduce how to accelerate Spark SQL with OAP to accelerate SQL performance on Cloud to archive 8X performance gain and RDD cache to improve K-means performance with 2.5X performance gain leveraging Intel Optane DCPM. Also we will have a deep dive how Optane DCPM for these performance gains.