Baidu’s deep learning technology has made tremendous progress in achieving top results in various challenging tasks in computer vision, image processing, NLP, etc. In the era of big data, one integrated Spark platform using scalable deep learning training and prediction is of utmost importance, especially to Baidu scale. In this talk, we will talk about our work in using Spark to drive deep learning training and prediction using Paddle, the deep learning library developed by Baidu IDL. This enables multiple Baidu’s production offline processing do data ingestion, preprocessing, feature extraction and model training in one Spark cluster. We will also address the resource heterogeneity to support multi-tenancy using Spark. Finally, we will also show some use cases and performance numbers.
Kyle Tsai is a Senior Architect from Baidu working on distributed infrastructure. Before Baidu, he worked on Ads serving system at Yahoo and Microsoft. He holds a Master degree in Computer Science from UCLA.