Skip to main content
Page 1
Engineering blog

Announcing General Availability of Ray on Databricks

We released Ray support public preview last year and since then, hundreds of Databricks customers have been using it for variety of use...
Engineering blog

Announcing Ray Autoscaling support on Databricks and Apache Spark™

Ray is an open-source unified compute framework that simplifies scaling AI and Python workloads in a distributed environment. Since we introduced support for...
Engineering blog

Introducing Apache Spark™ 3.5

Today, we are happy to announce the availability of Apache Spark™ 3.5 on Databricks as part of Databricks Runtime 14.0. We extend our...
Engineering blog

Announcing Ray support on Databricks and Apache Spark Clusters

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter...
Engineering blog

Simplify Data Conversion from Apache Spark to TensorFlow and PyTorch

June 16, 2020 by Liang Zhang and Weichen Xu in Engineering Blog
Petastorm is a popular open-source library from Uber that enables single machine or distributed training and evaluation of deep learning models from datasets...
Engineering blog

Introducing Built-in Image Data Source in Apache Spark 2.4

December 10, 2018 by Tomas Nykodym and Weichen Xu in Engineering Blog
Introduction With recent advances in deep learning frameworks for image classification and object detection, the demand for standard image processing in Apache Spark...