Fuwang Hu

Data Engineer, PayPal

Fuwang Hu is currently a MTS-1 data engineer in Paypal Global Data Governance and Regulation Technology, focusing on developing data applications to fulfill the requirements of various business scenarios, including risk management and enterprise compliance. Fuwang has 5+ years’ experience on building data applications by leveraging various big data technologies, eg. spark, hadoop, hbase, etc, after obtaining the master degree from TongJi University.

Past sessions

Nowadays, Spark is widely adopted in the big enterprise by handling the large volume of data. In PayPal, more and more complex data processing applications are running on top of Spark for its better performance and easy usage. Graphic analytics are among the emerging trend for different business use cases, E.g., risk control, compliance, etc. In this talk, we would like to share our practice while building the large scale graph applications on top of Spark. How to achieve 4-5x performance improvements while handling billions of nodes/edges? How to balance the performance and resources efficiently? What is the key learning while conducting the enterprise production-level pipelines by using Spark?

Key takeaways:

  • Introduce our use case of large graph analytics on Spark in PayPal.
  • Share our optimization practices to achieve big performance improvement.
  • Key learnings about building its production level pipelines, E.g., how to cope with the significant data skew, dramatic data growth and build the scalable solution