HomepageData + AI Summit 2022 Logo
Watch on demand

Quick to Production with the Best of Both Apache Spark and Tensorflow on Databricks

On Demand


  • Session


  • In-Person


  • Data Science, Machine Learning and MLOps


  • Retail and Consumer Goods


  • Intermediate


  • Moscone South | Upper Mezzanine | 151


  • 35 min


Using tensorflow with big datasets has been an impediment for building deep learning models due to the added complexities of running it in a distributed setting and complicated MLOps code, recent advancements in tensorflow 2, and some extension libraries for Spark has now simplified a lot of this. This talk focuses on how we can leverage the best of both Spark and tensorflow to build machine learning and deep learning models using minimal MLOps code letting Spark handle the grunt of work, enabling us to focus more on feature engineering and building the model itself. This design also enables us to use any of the libraries in the tensorflow ecosystem (like tensorflow recommenders) with the same boilerplate code. For businesses like ours, fast prototyping and quick experimentations are key to building completely new experiences in an efficient and iterative way. It is always preferable to have tangible results before putting more resources into a certain project. This design provides us with that capability and lets us spend more time on research, building models, testing quickly, and rapidly iterating. It also provides us with the flexibility to use our choice of framework at any stage of the machine learning lifecycle. In this talk, we will go through some of the best and new features of both spark and tensorflow, how to go from single node training to distributed training with very few extra lines of code, how to leverage MLFlow as a central model store, and finally, using these models for batch and real-time inference.

Session Speakers

Ronny Mathew

Manager, Data Science

Rue Gilt Groupe

See the best of Data+AI Summit

Watch on demand