Collaborative data science with familiar languages and tools
Work across engineering, data science and machine learning teams in one workspace. Use multiple languages, built-in data visualizations and automatic versioning, all within Notebooks.
Share Notebooks and work with peers across teams in multiple languages (R, Python, SQL and Scala) and libraries of your choice. Real-time coauthoring, commenting and automated versioning simplify collaboration while providing control.
Quickly discover new insights with built-in interactive visualizations, or leverage libraries such as Matplotlib and ggplot. Export results and Notebooks in HTML or IPYNB format, or build and share dashboards that always stay up to date.
Production at scale
Schedule Notebooks to automatically run machine learning and data pipelines at scale. Create multistage pipelines using Databricks Workflows. Set up alerts and quickly access audit logs for easy monitoring and troubleshooting.
Data Access: Quickly access available data sets or connect to any data source, on-premises or in the cloud.
Multi-Language Support: Explore data using interactive Notebooks with support for multiple programming languages within the same notebook, including R, Python, Scala and SQL.
Interactive Visualizations: Visualize insights through a wide assortment of point-and-click visualizations. Or use powerful scriptable options like Matplotlib, ggplot and D3.
Real-Time Coauthoring: Work on the same notebook in real time while tracking changes with detailed revision history.
Comments: Leave a comment and notify colleagues from within shared Notebooks.
Automatic Versioning: Automatic change-tracking and versioning to help you pick up where you left off.
Git-based Repos: Simplified Git-based collaboration, reproducibility and CI/CD workflows.
Runs Sidebar: Automatically log experiments, parameters and results from Notebooks directly to MLflow as runs, and quickly see and load previous runs and code versions from the sidebar.
Dashboards: Share insights with your colleagues and customers, or let them run interactive queries with Spark-powered dashboards.
Run Notebooks as Jobs: Turn Notebooks or JARs into resilient production jobs with a click or an API call.
Jobs Scheduler: Execute jobs for production pipelines on a specific schedule.
Notebook Workflows: Create multistage pipelines with the control structures of the source programming language.
Notifications and Logs: Set up alerts and quickly access audit logs for easy monitoring and troubleshooting.
Permissions Management: Quickly manage access to each individual notebook, or a collection of Notebooks, and experiments, with one common security model.
Clusters: Quickly attach Notebooks to auto-managed clusters to efficiently and cost-effectively scale up compute.
Integrations: Connect to Tableau, Looker, Power BI, RStudio, Snowflake and also through your favorite IDEs such as VS Code — allowing data scientists and engineers to use their tools of choice.
How it works
Shared and interactive Notebooks, experiments and extended files support allow data scientist teams to organize, share and manage complex data science projects more effectively throughout the lifecycle. APIs and Job Scheduler allow data engineering teams to quickly automate complex pipelines, while business analysts can directly access results via interactive dashboards.
Activating aviation data
with real-time ML
Delivering revenue-generating experiences with data and ML
with data and ML
eBooks and Webinars
The Big Book of Data Science Use Cases
The Big Book of Machine Learning Use Cases
AutoML Rapid, simplified machine learning for everyone
MLOps Virtual Event: Standardizing MLOps at Scale
Automating the ML Lifecycle With Databricks Machine Learning
The Data Scientist’s Guide to Apache Spark™