In Big Data area, ETL(Extract, Load, Transform) is important data processing procedure to transfer raw data from a source server to a data warehouse. Field-programmable gate array(FPGA) with highly customized intellectual property(IP) can not only bring better performance but also lower power consumption to accelerate computation intensive segments for an application.
In this session, we would like to describe how FPGA can help a typical Spark ETL workloads to reduce high CPU utilization issue and release more CPU power to run some compute-intensive jobs. Furthermore, FPGA can also benefits deep learning applications for AI.
We will use micro-benchmarks as examples to identify a typical ETL/AI workload profiling, highlight the hotspots during data format transformation, and figure out which function costs higher CPU utilization in ETL procedure. Leveraging FPGA accelerator, we can move the functions with high CPU usages such as data source parsing or data compression/de-compression to use FPGA IPs and keep the CPU resource to only process some mission critical tasks which can improve the performance dramatically. Finally, we will give a real-world workload as a use case to explain how and when to use these FPGA IPs to optimize your Big Data applications.
In this topic, you will learn
Session hashtag: #SAISAI5
Weiting is a senior software engineer at Intel Software. He has worked for Big Data and Cloud Solutions including Spark, Hadoop, OpenStack, and Kubernetes for more than 5 years. He has also worked for big data and Intel architecture technologies research including CPU, GPU, and FPGA. One major responsibility for him is to research & optimize Big Data technology and enable global customers to use Big Data with Intel solutions. Weiting is working on next-generation big data technologies on Intel x86 platform.