Skip to main content
Page 1
Data AI

Architecting MLOps on the Lakehouse

Here at Databricks, we have helped thousands of customers put Machine Learning (ML) into production. Shell has over 160 active AI projects saving...
Engineering blog

Three Principles for Selecting Machine Learning Platforms

June 24, 2021 by Joseph Bradley in Engineering Blog
This blog post is the second in a series on ML platforms, operations, and governance. For the first post, see Rafi Kurlansik’s post...
Engineering blog

Scaling Hyperopt to Tune Machine Learning Models in Python

October 29, 2019 by Joseph Bradley and Max Pumperla in Engineering Blog
Try the Hyperopt notebook to reproduce the steps outlined below and watch our on-demand webinar to learn more. Hyperopt is one of the...
Engineering blog

Automated Hyperparameter Tuning, Scaling and Tracking: On-Demand Webinar and FAQs now available!

Try this notebook in Databricks On June 20th, our team hosted a live webinar— Automated Hyperparameter Tuning, Scaling and Tracking on Databricks —with...
Engineering blog

Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt

Hyperparameter tuning is a common technique to optimize machine learning models based on hyperparameters, or configurations that are not learned during model training...
Engineering blog

Enhanced Hyperparameter Tuning and Optimized AWS Storage with Databricks Runtime 5.4 ML

We are excited to announce the release of Databricks Runtime 5.4 ML ( Azure | AWS ). This release includes two Public Preview...
Engineering blog

Databricks Runtime 5.2 ML Features Multi-GPU Workflow, Pregel API, and Performant GraphFrames

January 30, 2019 by Yifan Cao and Joseph Bradley in Engineering Blog
We are excited to announce the release of Databricks Runtime 5.2 for Machine Learning. This release includes several new features and performance improvements...
Engineering blog

Introducing HorovodRunner for Distributed Deep Learning Training

Today, we are excited to introduce HorovodRunner in our Databricks Runtime 5.0 ML ! HorovodRunner provides a simple way to scale up your...
Company blog

Databricks Engineering Interns & Impact in Summer 2018

October 10, 2018 by Joseph Bradley in Company Blog
Thanks to our awesome interns! This summer, our Engineering interns at Databricks did amazing work. Our interns, working on teams from Developer Tools...
Engineering blog

Developing Custom Machine Learning Algorithms in PySpark

August 30, 2017 by Ajay Saini and Joseph Bradley in Engineering Blog
Developing custom Machine Learning (ML) algorithms in PySpark—the Python API for Apache Spark—can be challenging and laborious. In this blog post, we describe...
Engineering blog

Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering

This is a cross blog post effort between Databricks and Uber Engineering. Yun Ni is a software engineer on Uber’s Machine Learning Platform...
Engineering blog

Intel’s BigDL on Databricks

February 9, 2017 by Sue Ann Hong and Joseph Bradley in Engineering Blog
Try this notebook on Databricks Intel recently released its BigDL project for distributed deep learning on Apache Spark. BigDL has native Spark integration...
Engineering blog

Deep Learning on Databricks

December 21, 2016 by Joseph Bradley and Tim Hunter in Engineering Blog
We are excited to announce the general availability of Graphic Processing Unit (GPU) and deep learning support on Databricks! This blog post will...
Company blog

On Demand Webinar and FAQ: Apache Spark MLlib 2.x: Migrating ML Workloads to DataFrames

December 14, 2016 by Joseph Bradley and Jules Damji in Company Blog
Last week, we held a live webinar, Apache Spark MLlib 2.x: Migrating ML Workloads to DataFrames , to demonstrate the ease with which...
Engineering blog

GPU Acceleration in Databricks

Databricks is adding support for Apache Spark clusters with Graphics Processing Units (GPUs), ready to accelerate Deep Learning workloads. With Spark deployments tuned...
Engineering blog

Apache Spark 2.0 Preview: Machine Learning Model Persistence

May 31, 2016 by Joseph Bradley in Engineering Blog
Introduction Consider these Machine Learning (ML) use cases: A data scientist produces an ML model and hands it over to an engineering team...
Engineering blog

Approximate Algorithms in Apache Spark: HyperLogLog and Quantiles

Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...
Engineering blog

On-Time Flight Performance with GraphFrames for Apache Spark

Introduction Graph structures are a more intuitive approach to many classes of data problems. Whether traversing social networks, restaurant recommendations, or flight paths...
Engineering blog

Introducing GraphFrames

We would like to thank Ankur Dave from UC Berkeley AMPLab for his contribution to this blog post. Databricks is excited to announce...
Engineering blog

Auto-scaling scikit-learn with Apache Spark

February 8, 2016 by Tim Hunter and Joseph Bradley in Engineering Blog
Data scientists often spend hours or days tuning models to get the highest accuracy. This tuning typically involves running a large number of...
Engineering blog

MLlib Highlights in Apache Spark 1.6

January 21, 2016 by Joseph Bradley in Engineering Blog
To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . With the latest release, Apache Spark’s...
Company blog

Visualizing Machine Learning Models

To try the new visualization features mentioned in this blog, sign up for a 14-day free trial of Databricks today. You've built your...
Engineering blog

Large Scale Topic Modeling: Improvements to LDA on Apache Spark

This blog was written by Feynman Liang and Joseph Bradley from Databricks, and Yuhao Yang from Intel. To get started using LDA, download...
Engineering blog

New Features in Machine Learning Pipelines in Apache Spark 1.4

Apache Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark...
Engineering blog

Topic modeling with LDA: MLlib meets GraphX

March 25, 2015 by Joseph Bradley in Engineering Blog
Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or...
Engineering blog

Random Forests and Boosting in MLlib

January 21, 2015 by Joseph Bradley and Manish Amde in Engineering Blog
This is a post written together with Manish Amde from Origami Logic. Apache Spark 1.2 introduces Random Forests and Gradient-Boosted Trees (GBTs) into...
Engineering blog

ML Pipelines: A New High-Level API for MLlib

MLlib’s goal is to make practical machine learning (ML) scalable and easy. Besides new algorithms and performance improvements that we have seen in...
Engineering blog

Scalable Decision Trees in MLlib

September 29, 2014 by Manish Amde and Joseph Bradley in Engineering Blog
This is a post written together with one of our friends at Origami Logic. Origami Logic provides a Marketing Intelligence Platform that uses...