Skip to main content

Announcing the Public Preview of Distributed ML on Serverless and Standard Clusters

Announcing the Public Preview of Distributed ML on Serverless and Standard Clusters

Summary

  • Distributed ML support: Run Apache Spark™ MLlib (Python), Optuna, MLflow Spark, and Joblib Spark on serverless notebooks, jobs, and standard clusters.
  • Unified compute and governance: Scale ML workloads with built-in security, fine-grained access control, and multi-user isolation powered by Lakeguard and Spark Connect.
  • Comprehensive ML library offering: These additions complement existing single-node ML libraries such as Scikit-learn, XGBoost, and LightGBM, delivering a complete ML experience across both standard and serverless compute.

The Public Preview of Apache Spark MLlib (Python) and Optuna on serverless notebooks and jobs, as well as standard clusters, brings distributed machine learning to Databricks’ unified compute environments, combining performance, security, and ease of collaboration without the need for dedicated clusters.

From Dedicated to Serverless ML

Until now, distributed ML workloads such as training with Apache Spark MLlib or hyperparameter tuning with Optuna could only run on dedicated clusters. While effective, dedicated clusters are single-identity environments (user or group) that lack native fine-grained access control (FGAC), limiting secure multi-user collaboration.

With this release, Databricks extends distributed ML capabilities to both serverless and standard clusters, allowing teams to scale their machine learning workloads with built-in security and governance.

These enhancements complement existing single-node ML support, including Scikit-learn, XGBoost, and LightGBM, delivering a unified, end-to-end machine learning experience across all Databricks compute options.

Expanded ML Capabilities on Databricks Compute

Databricks users can now run distributed ML workloads on both serverless and standard clusters, including:

  • Train distributed models using Apache Spark MLlib (Python)
  • Perform large-scale hyperparameter tuning with Optuna
  • Track and manage experiments with MLflow Spark
  • Distribute single-node workloads from Scikit-learn, LightGBM, and XGBoost using Joblib Spark

Together, these capabilities unify the ML experience, enabling teams to scale seamlessly from local experimentation to distributed production workloads.

Unified Compute and Governance

Databricks’ Lakeguard technology, built on Spark Connect, powers both standard and serverless compute with fine-grained access control (FGAC) and multi-user isolation. This helps ensure that data and workloads are protected under the same governance layer, whether you manage your own clusters or rely on serverless compute.

Key benefits include:

  • Unified compute experience: Run distributed ML alongside analytics and ETL workloads on both standard and serverless compute.
  • Secure multi-user collaboration: Multiple users can run concurrent Spark workloads safely isolated within shared environments.
  • Native FGAC enforcement: Permissions, attribute-based access control (ABAC), row filters, and column masks are applied per user for secure access to features and models.

These capabilities, introduced in Spark 4 and now integrated into Databricks to deliver the next generation of distributed machine learning for modern data teams.

Hozumi Nakamo, Product Manager at SAP, shared:

"Apache Spark MLlib's availability on Databricks serverless compute empowers SAP Databricks customers to scale machine learning without infrastructure headaches, making it easy to unlock insights from business data securely and efficiently."

This reflects how Databricks serverless compute simplifies distributed ML — allowing customers to focus on insights rather than infrastructure.

Built with the Open Source Community

This milestone reflects Databricks’ continued collaboration with the open source community, including work with NVIDIA, a long-time contributor to Apache Spark. Together, Databricks and NVIDIA expanded Spark ML to Spark Connect as part of the Spark 4 release, enabling distributed ML workloads to run efficiently on both standard and serverless compute.

Andrew Feng, Vice President of GPU Software at NVIDIA, shared:

"Spark Connect represents a new era of accessibility and ease of adoption for Spark users. NVIDIA has been active in the open source Spark community for more than seven years. By extending Spark MLlib with support on Spark Connect, enterprises can now achieve effortless, end-to-end GPU acceleration with no code changes - delivering breakthrough performance gains of up to 9x while reducing costs by as much as 80%. This is the architecture we’ve adopted within NVIDIA and have helped enterprises transition to as well. It’s redefining what’s possible with data and AI at scale."

Through this collaboration with NVIDIA and the broader Spark community, Databricks continues to make distributed ML more performant, accessible, and cost-effective for every enterprise.

Get started

You can start running distributed ML on Databricks today:

  • On serverless compute: Attach your workload to serverless compute using environment version 4 or higher. Today, this can be run on CPUs and GPUs (beta).
  • On standard clusters: Use Databricks Runtime 17.0 or above, and run your workloads as usual.

Learn more:

Never miss a Databricks post

Subscribe to our blog and get the latest posts delivered to your inbox