Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL

Download Slides

A long time ago, there was Caffe and Theano, then came Torch and CNTK and Tensorflow, Keras and MXNet and Pytorch and Caffe2….a sea of Deep learning tools but none for Spark developers to dip into. Finally, there was BigDL, a deep learning library for Apache Spark. While BigDL is integrated into Spark and extends its capabilities to address the challenges of Big Data developers, will a library alone be enough to simplify and accelerate the deployment of ML/DL workloads on production clusters? From high level pipeline API support to feature transformers to pre-defined models and reference use cases, a rich repository of easy to use tools are now available with the ‘Analytics Zoo’. We’ll unpack the production challenges and opportunities with ML/DL on Spark and what the Zoo can do

« back
About Radhika Rangarajan

Radhika is the Director of Engineering for Big Data Solutions within Intel’s Software and Services Group, where she manages several open source projects and partner engagements, specifically on Apache Spark and machine learning. She leads the team responsible for Driving Enterprise, Cloud and Customer partnerships for Spark Analytics and Deep learning with BigDL. Radhika is also the Co-founder and the Director for the West Coast chapter of Women in Big Data, a grassroots community focused on strengthening the diversity and championing the success of Women in Big Data and Analytics.

About Mike Pittaro

Mike has over 25 years’ experience in the high technology industry, specializing in high performance computing, data warehousing, and distributed systems. He has held engineering and support positions at Alliant Computer, Kendall Square Research, Informatica, and SnapLogic. Mike is currently a Distinguished Engineer on Dell EMC's Open Source Solutions team, where he focuses on delivering big data solutions