In recent releases, TensorFlow has been enhanced for distributed learning and HDFS access. Outside of the Google cloud, however, users still needed a dedicated cluster for TensorFlow applications. There are several community projects wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they are limited to support synchronous distributed learning only, and don’t allow TensorFlow servers to communicate with each other directly.
This session will introduce a new framework, TensorFlowOnSpark, for scalable TensorFlow learning, which will be open sourced in Q1 2017. This new framework enables easy experimentation for algorithm designs, and supports scalable training and inferencing on Spark clusters. It supports all TensorFlow functionalities, including synchronous & asynchronous learning, model & data parallelism and TensorBoard. It provides architectural flexibility for data ingestion to TensorFlow (pushing vs. pulling) and network protocols (gRPC and RDMA) for server-to-server communication. Its Python API makes the integration with existing Spark libraries like MLlib easy.
The speakers will walk through multiple examples to outline these key capabilities, and share benchmark results about scalability. Learn how, with a few lines of code changes, an existing TensorFlow algorithm can be transformed into a scalable application. You’ll also be given tangible takeaways on how deep learning could be easily conducted on cloud or on-premise with a new framework.
Session hashtag: #SFdev9
Andy Feng is a VP Architect at Nvidia, building solutions to empower advanced AI research and applications in variety compute environments. Previously, he was a VP Architect at Yahoo, leading the architecture and design of big data and machine learning initiatives.
Lee Yang is a Senior Principal Engineer at Verizon/Oath (formerly Yahoo), working on large-scale systems and machine learning platforms.