Apache Arrow is new in Spark 2.3, and offers faster interchange between Spark and Python. Apache Arrow also has connections to Tensorflow (and even without those can be fed from Pandas). This talk will look at how to use Arrow to accelerate data copy from Spark to Tensorflow, and how to expose basic functionality in Scala for working with Tensorflow. From there we will dive in to how to construct new Deep Learning ML pipeline stages in Python and make them available to be used by our friends in Scala land.
Session hashtag: #DL7SAIS
Holden is a transgender Canadian open source developer with a focus on Apache Spark, Airflow, Kubeflow, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and Kubeflow for Machine Learning. She is a committer and PMC on Apache Spark. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.