Skip to main content

Databricks Simplifies and Scales Deep Learning with New Apache Spark Library

Company integrates TensorFlow into Apache Spark machine learning APIs; resulting models can be used directly by SQL analysts

June 6, 2017
Share this post

San Francisco, CA -- (Marketwired - June 6, 2017) - Databricks, the company founded by the creators of the popular Apache Spark project, today announced Deep Learning Pipelines, a new library to integrate and scale out deep learning in Apache Spark.

Prior to today, deep learning has been unapproachable for many because of the dependency on separate, low-level frameworks that require specialized skills. Furthermore, these frameworks do not scale well because they only run on a single node. Announced today at Spark Summit 2017, Databricks is releasing Deep Learning Pipelines, an open source package that adds high-level, easy-to-use deep learning APIs for technologies such as TensorFlow to Apache Spark, making it possible for enterprises to scale deep learning across multiple nodes.

“This is a huge step in furthering Databricks’ mission to democratize artificial intelligence and data science,” said Matei Zaharia, cofounder and chief technologist at Databricks. “This work has the potential to accomplish for deep learning what Spark did for big data, which is to make it approachable to a much broader audience, from data scientists to business analysts."

The new Deep Learning Pipelines package provides users with the ability to:

  • Easily call deep learning libraries within existing Spark ML workflows, making it immediately available to Spark developers without having to learn a separate tool;
  • Seamlessly perform transfer learning of deep learning models via Spark MLlib Pipelines, combining the power of deep learning with Spark’s data processing and ML capabilities;
  • Leverage Spark’s distributed computation engine with the integration of TensorFlow™ and Keras to quickly train and productionize high quality models at scale;
  • Empower organizations to more broadly leverage AI through mechanisms that turn deep learning models into SQL functions for business and data analysts;
  • Work more easily with complex data such as images through a set of Spark-native utilities.

Deep Learning Pipelines for Apache Spark democratizes access to artificial intelligence in the enterprise by eliminating the barriers to deep learning and processing complex data at scale.

Read more about this announcement in the blog post:

Access a trial of Databricks:

About Databricks:

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz and NEA, has a global customer base that includes Salesforce, Viacom, Amgen, Shell and HP. For more information, visit

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Recent Press Releases

Databricks Strengthens Presence in Korea with Senior Leadership Hires
Read Now
test press release
Read Now
Databricks Unveils New Mosaic AI Capabilities to Help Customers Build Production-Quality AI Systems and Applications
Read Now
Introducing Shutterstock ImageAI, Powered by Databricks: An Image Generation Model Built for the Enterprise
Read Now
Databricks and NVIDIA Strengthen Partnership to Accelerate Enterprise Data for the Era of Generative AI
Read Now
View All