Skip to main content
Platform blog

The AI Dilemma

Artificial Intelligence (AI) has massive potential to drive disruptive innovations affecting most enterprises on the planet. However, most enterprises are struggling to succeed with AI . Why is that? Simply put, AI and Data are siloed in different systems and different organizations. Enterprise data is siloed across hundreds of systems such as data warehouses, data lakes, databases and file systems that are not AI-enabled. Popular machine learning frameworks such as TensorFlow, PyTorch, and SciKit-Learn don’t do data processing. Because these data systems don’t “do AI” and these AI technologies don’t “do data”, it’s extremely hard for enterprises to succeed with AI, which after all requires both ingredients to be successful.

Adding to this data and AI technology divide, companies have siloed teams for data and AI. Data engineers deal with large scale data processing and production deployment, whereas data scientists deal with AI - exploring data, training and validating models. Given the highly iterative nature of AI development, this organizational separation creates significant friction and slows AI initiatives considerably.

Divide between Data and AI

Unifying Data Science and Engineering

Unified Analytics is a new category of solutions that unifies data science and engineering, making AI much more achievable for enterprise organizations and enabling them to accelerate their AI initiatives. Unified Analytics brings the disparate worlds of data science and engineering together with a common platform— making it easier for data engineers to build data pipelines across siloed systems and prepare labelled datasets for model building while enabling data scientists to explore and visualize data and build models collaboratively. Unified Analytics provides one engine to prepare high quality data at massive scale and iteratively train machine learning models on the same data. Unified Analytics also provides collaboration capabilities for data scientists and data engineers to work effectively across the entire AI lifecycle. Organizations that succeed in unifying their data at scale and unifying that data with the best AI technologies will have significantly higher chance of success with  AI.

Databricks Unified Analytics Platform

Databricks Unified Analytics Platform, powered by Apache SparkTM, lowers the barrier for enterprises to innovate with AI and accelerates their innovation . It has three major components:

  1. Databricks Workspace to empower data science and engineering to collaborate on a common platform; Databricks Workspace is tightly integrated with cloud-native Spark clusters and enables data scientists to productively perform their tasks across the AI lifecycle — from data exploration at scale to the building of predictive models using machine learning and deep learning.
  2. Databricks Runtime to unify data and AI technologies at scale; Databricks Runtime uniquely combines data processing capabilities and machine learning frameworks into one unified analytics engine.
  3. Databricks Cloud Service automates and simplifies devops by abstracting the complexity of the data infrastructure by auto-configuring and auto-scaling clusters; and provides  enterprise grade security and compliance, along with best-in-class Spark support from the original creators of Apache Spark.

New Innovations in Databricks Unified Analytics Platform

At Spark + AI Summit in San Francisco, we proudly announced the following innovations to further unify data science and engineering to accelerate AI initiatives -

Databricks Delta - Making Data Ready for Machine Learning

Databricks Delta, now part of the Databricks Runtime, works with Apache Spark to make data reliable and ready for ML through transactional integrity on batch and streaming data. Delta also improves  performance up to 100x through caching and indexing capabilities. With Delta, data engineers can ensure that data scientists have early access to new reliable datasets at scale. Sign up for Databricks Delta here.

Databricks Runtime for ML - Simplifying Distributing Learning at Scale

Based on customers demand, we are very excited to announce the new native and deep integration of popular ML libraries including xgboost, scikit-learn, and numpy as well as popular deep learning framework such as TensorFlow, Keras, Horovod. This will dramatically simplify configuration and setup allowing data scientists to focus on their data and model building.

MLflow - Standardizing the Complete Machine Learning Lifecycle

MLflow is the first open source, cross-cloud framework that dramatically simplifies the workflow associated with building, deploying and iterating on machine learning. With MLflow, organizations can package their code for reproducible runs, execute and compare hundreds of parallel experiments, leverage any of the existing open source projects, and deploy models to production on a variety of serving platforms. You can learn more about MLflow at Matei’s blog and download MLflow here. For future updates on MLflow and Databricks integration, sign-up here.

Databricks Unified Analytics Platform

Databricks Unified Analytics Platform

Conclusion

Our goal is to accelerate innovation by unifying data science, engineering and business. We will continue to drive data science and engineering unification by adding new capabilities to the Databricks Unified Analytics Platform. Get started today with Databricks Unified Analytics Platform.