Skip to main content
Page 1
Platform blog

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark clusters

We are thrilled to announce Unity Catalog Lakeguard , which allows you to run Apache Spark™ workloads in SQL, Python, and Scala with...
Generative AI

DSPy on Databricks

Large language models (LLMs) have generated interest in effective human-AI interaction through optimizing prompting techniques. “Prompt engineering” is a growing methodology for tailoring...
Company blog

Announcing DBRX: A new standard for efficient open source LLMs

Databricks’ mission is to deliver data intelligence to every enterprise by allowing organizations to understand and use their unique data to build their...
Company blog

Lilac Joins Databricks to Simplify Unstructured Data Evaluation for Generative AI

Today, we are thrilled to announce that Lilac is joining Databricks. Lilac is a scalable, user-friendly tool for data scientists to search, cluster...
Platform blog

Architecting Global Data Collaboration with Delta Sharing

In today's interconnected digital landscape, data sharing and collaboration across organizations and platforms are crucial for modern business operations. Delta Sharing, an innovative...
Platform blog

Data Intelligence Platforms

The observation that " software is eating the world " has shaped the modern tech industry. Today, software is ubiquitous in our lives...
Platform blog

Introducing Predictive Optimization: Faster Queries, Cheaper Storage, No Sweat

Predictive Optimization intelligently optimizes your Lakehouse table data layouts for peak performance and cost-efficiency - without you needing to lift a finger.
Platform blog

The Simplification of AI Data

Talk to any data science organization and they will almost unanimously tell you that the biggest challenge to building high quality AI models...
Company blog

Databricks + MosaicML

Today, we’re excited to share that we’ve completed our acquisition of MosaicML, a leading platform for creating and customizing generative AI models for...
Company blog

Helping Enterprises Responsibly Deploy AI

The promise of artificial intelligence (AI) is undeniable, but its enormous potential also comes with enormous responsibilities. Companies and organizations around the world...
Engineering blog

Project Lightspeed Update - Advancing Apache Spark Structured Streaming

In this blog post, we will review the advancements in Spark Structured Streaming since we announced Project Lightspeed a year ago, from performance...
Platform blog

What’s new with Unity Catalog at Data and AI Summit 2023

The fundamental principles of governance – accountability, compliance, quality, and transparency – that are essential for data management have now become equally imperative...
Platform blog

Introducing Lakehouse Federation Capabilities in Unity Catalog

Lakehouse Federation is now in public preview! Data teams face many challenges to quickly access the right data primarily due to data fragmentation...
Platform blog

Introducing LakehouseIQ: The AI-Powered Engine that Uniquely Understands Your Business

Today, we are thrilled to announce LakehouseIQ, a knowledge engine that learns the unique nuances of your business and data to power natural...
Platform blog

Lakehouse AI: A Data-Centric Approach to Building Generative AI Applications

Generative AI will have a transformative impact on every business. Databricks has been pioneering AI innovations for a decade, actively collaborating with thousands...
Platform blog

Introducing Lakehouse Apps

Lakehouse Apps is a new way to build native applications for Databricks. Lakehouse Apps will offer the most secure way to build, distribute...
Platform blog

Extending Databricks Unity Catalog with an Open Apache Hive Metastore API

Today, we are excited to announce the preview of a Hive Metastore (HMS) interface for Databricks Unity Catalog , which allows any software...
Engineering blog

Latency goes subsecond in Apache Spark Structured Streaming

Apache Spark Structured Streaming is the leading open source stream processing platform. It is also the core technology that powers streaming on the...
Company blog

Welcome Okera: Adopting an AI-centric approach to governance

For a decade, Databricks has focused on democratizing data and AI for organizations around the world. And since the debut of ChatGPT last...
Company blog

Enroll in our New Expert-Led Large Language Models (LLMs) Courses on edX

New Large Language Model Courses with edX As Large Language Model (LLM) applications disrupt countless industries, generative AI is becoming an important foundational...
Company blog

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM

Two weeks ago, we released Dolly , a large language model (LLM) trained for less than $30 to exhibit ChatGPT-like human interactivity (aka...
Company blog

Hello Dolly: Democratizing the magic of ChatGPT with open models

Update Apr 12, 2023: We have released Dolly 2.0, licensed for both research and commercial use. See the new blog post here...
Platform blog

Announcing General Availability of Delta Sharing

Today we are excited to announce that Delta Sharing is generally available (GA) on AWS and Azure. With the GA release, you can...
Company blog

Open Sourcing All of Delta Lake

The theme of this year's Data + AI Summit is that we are building the modern data stack with the lakehouse. A fundamental...
Platform blog

Introducing Databricks Marketplace

We're pleased to announce Databricks Marketplace, an open marketplace for exchanging data products such as datasets, notebooks, dashboards, and machine learning models. To...
Platform blog

Introducing Data Clean Rooms for the Lakehouse

We are excited to announce data clean rooms for the Lakehouse, allowing businesses to easily collaborate with their customers and partners on any...
Engineering blog

Project Lightspeed: Faster and Simpler Stream Processing With Apache Spark

Streaming data is a critical area of computing today. It is the basis for making quick decisions on the enormous amounts of incoming...
Company blog

Apache Spark and Photon Receive SIGMOD Awards

June 15, 2022 by Reynold Xin and Matei Zaharia in Company Blog
This week, many of the most influential engineers and researchers in the data management community are convening in-person in Philadelphia for the ACM...
Platform blog

Announcing General Availability of Databricks Feature Store

Today, we are thrilled to announce that Databricks Feature Store is generally available (GA)! In this blog post, we explore how Databricks Feature...
Platform blog

Announcing Gated Public Preview of Unity Catalog on AWS and Azure

Update: Unity Catalog is now generally available on AWS and Azure. At the Data and AI Summit 2021, we announced Unity Catalog...
Platform blog

Top Three Data Sharing Use Cases With Delta Sharing

Update: Delta Sharing is now generally available on AWS and Azure. Data sharing has become an essential component to drive business value as...
Company blog

Snowflake Claims Similar Price/Performance to Databricks, but Not So Fast!

On Nov 2, 2021, we announced that we set the official world record for the fastest data warehouse with our Databricks SQL lakehouse...
Company blog

Simplifying Data + AI, One Line of TypeScript at a Time

October 21, 2021 by Reynold Xin and Matei Zaharia in Culture
Today, Databricks is known for our backend engineering, building and operating cloud systems that span millions of virtual machines processing exabytes of data...
Platform blog

Frequently Asked Questions About the Data Lakehouse

Question Index What is a Data Lakehouse? What is a Data Lake? What is a Data Warehouse? How is a Data Lakehouse different...
Engineering blog

Monitoring ML Models With Model Assertions

This is a guest post from the Stanford University Computer Science Department. We thank Daniel Kang, Deepti Raghavan and Peter Bailis of Stanford...
Platform blog

Introducing Delta Sharing: An Open Protocol for Secure Data Sharing

Update: Delta Sharing is now generally available on AWS and Azure. Get an early preview of O'Reilly's new ebook for the step-by-step guidance...
Engineering blog

An Update on Project Zen: Improving Apache Spark for Python Users

September 4, 2020 by Hyukjin Kwon and Matei Zaharia in Engineering Blog
Apache Spark™ has reached its 10th anniversary with Apache Spark 3.0 which has many significant improvements and new features including but not limited...
Company blog

Spark + AI Summit Europe is Expanding and Getting a New Name: Data + AI Summit Europe

September 2, 2020 by Ali Ghodsi, Reynold Xin and Matei Zaharia in Company Blog
Back in 2013, we held the first Spark Summit — a gathering of the Apache Spark™ community with leading contributors and production users...
Company blog

Introducing the Next-Generation Data Science Workspace

At today’s Spark + AI Summit 2020, we unveiled the next generation of the Databricks Data Science Workspace: An open and unified experience...
Company blog

MLflow Joins the Linux Foundation to Become the Open Standard for Machine Learning Platforms

Watch Spark + AI Summit Keynotes here At today's Spark + AI Summit 2020, we announced that MLflow is becoming a Linux Foundation...
Company blog

Introducing Apache Spark 3.0

We’re excited to announce that the Apache Spark TM 3.0.0 release is available on Databricks as part of our new Databricks Runtime 7.0...
Platform blog

Evolving the Databricks brand

Some brands start out as, well, brands. A lot of work goes into the concept and painting the picture before the business is...
Engineering blog

What Is a Lakehouse?

Read Building the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data warehouse...
Company blog

Introducing the MLflow Model Registry

Watch the announcement and demo At today’s Spark + AI Summit in Amsterdam , we announced the availability of the MLflow Model Registry...
Company blog

Announcing the MLflow 1.1 Release

We’re excited to announce today the release of MLflow 1.1. In this release, we’ve focused on fleshing out the tracking component of MLflow...
Engineering blog

Announcing the MLflow 1.0 Release

MLflow is an open source platform to help manage the complete machine learning lifecycle. With MLflow, data scientists can track and share experiments...
Platform blog

Introducing MLflow Run Sidebar in Databricks Notebooks

April 30, 2019 by Andrew Chen and Matei Zaharia in Announcements
At Spark+AI Summit 2019, we announced the GA of Managed MLflow on Databricks in which we take the latest and greatest of open...
Platform blog

Announcing General Availability of Managed MLflow on Databricks

Try this tutorial in Databricks MLflow is an open source platform to help manage the complete machine learning lifecycle. With MLflow, data scientists...
Platform blog

Managed MLflow on Databricks now in public preview

Try this tutorial in Databricks Building production machine learning applications is challenging because there is no standard way to record experiments, ensure reproducible...
Engineering blog

Kicking Off 2019 with an MLflow User Survey

January 8, 2019 by Matei Zaharia in Engineering Blog
It’s been six months since we launched MLflow , an open source platform to manage the machine learning lifecycle, and the project has...
Platform blog

MLflow On-Demand Webinar and FAQ Now Available!

September 12, 2018 by Matei Zaharia and Denny Lee in Product
On August 30th, our team hosted a live webinar— Introducing MLflow: Infrastructure for a complete Machine Learning lifecycle —with Matei Zaharia, Co-Founder and...
Engineering blog

MLflow 0.2 Released

At this year’s Spark+AI Summit , we introduced MLflow , an open source platform to simplify the machine learning lifecycle. In the 3...
Engineering blog

Introducing MLflow: an Open Source Machine Learning Platform

Learn more about Managed MLflow on Databricks Everyone who has tried to do machine learning development knows that it is complex. Beyond the...
Company blog

Matei Zaharia’s 5 predictions about big data and AI in 2018

January 17, 2018 by Matei Zaharia in Company Blog
Over the past few years, the demand for artificial intelligence (AI) and machine learning capabilities has surged with innovations in natural language processing...
Company blog

Spark Summit is Becoming the Spark + AI Summit

December 6, 2017 by Matei Zaharia in Company Blog
We’re excited to announce that Spark Summit is expanding its coverage in 2018 to include in-depth content on artificial intelligence. We are also...
Company blog

A Technical Overview of Azure Databricks

November 15, 2017 by Matei Zaharia and Peter Carlin in Company Blog
This is a joint blog post from Matei Zaharia, Chief Technologist at Databricks and Peter Carlin, Distinguished Engineer at Microsoft. Today at Microsoft...
Company blog

Databricks Delta: A Unified Data Management System for Real-time Big Data

Combining the best of data warehouses, data lakes and streaming For an in-depth look and demo, join the webinar . Today we are...
Platform blog

Sharing Knowledge with the Community in a Preview of Apache Spark: The Definitive Guide

Apache Spark has seen immense growth over the past several years. The size and scale of this Spark Summit is a true reflection...
Company blog

Databricks and Apache Spark 2016 Year in Review

Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...
Engineering blog

Spark Structured Streaming

Apache Spark 2.0 adds the first version of a new higher-level API, Structured Streaming, for building continuous applications . The main goal is...
Engineering blog

Continuous Applications: Evolving Streaming in Apache Spark 2.0

July 28, 2016 by Matei Zaharia in Engineering Blog
Since its release, Spark Streaming has become one of the most widely used distributed streaming engines, thanks to its high-level API and exactly-once...
Engineering blog

Introducing Apache Spark 2.0

Today, we're excited to announce the general availability of Apache Spark 2.0 on Databricks. This release builds on what the community has learned...
Company blog

Introducing Databricks Community Edition: Apache Spark for All

February 17, 2016 by Ion Stoica and Matei Zaharia in Company Blog
As developers at heart, we at Databricks are committed to the development of Apache Spark and the continued growth of the community. Today...
Company blog

Databricks 2015 Year In Review: Democratizing Access to Data

To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . 2015 has been a phenomenal year...
Engineering blog

Apache Spark 2015 Year In Review

To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . 2015 has been a year of...
Engineering blog

Introducing Apache Spark Datasets

Developers have always loved Apache Spark for providing APIs that are simple yet powerful, a combination of traits that makes complex analysis possible...
Company blog

Spark Survey 2015 Results are now available

September 24, 2015 by Matei Zaharia, Patrick Wendell and Denny Lee in Company Blog
We ran the Spark Survey 2015 this summer to gain insights on how organizations are using Apache Spark. The results of this year’s...
Engineering blog

Diving into Apache Spark Streaming's Execution Model

With so many distributed stream processing engines available, people often ask us about the unique benefits of Apache Spark Streaming . From early...
Company blog

Databricks is now Generally Available

June 15, 2015 by Ion Stoica and Matei Zaharia in Company Blog
We are excited to announce today, at Spark Summit 2015 , the general availability of the Databricks – a hosted data platform from...
Engineering blog

Deep Dive into Spark SQL's Catalyst Optimizer

Check out the Why the Data Lakehouse is Your Next Data Warehouse ebook to discover the inner workings of the Databricks Lakehouse Platform...
Engineering blog

Apache Spark Turns Five Years Old!

March 31, 2015 by Matei Zaharia in Engineering Blog
Today, we’re celebrating an important milestone for the Apache Spark project -- it’s now been five years since Spark was first open sourced...
Engineering blog

Apache Spark: A review of 2014 and looking ahead to 2015 priorities

February 13, 2015 by Patrick Wendell and Matei Zaharia in Engineering Blog
2014 has been a year of tremendous growth for Apache Spark. It became the most active open source project in the Big Data...
Company blog

"Learning Spark" book available from O'Reilly

Today we are happy to announce that the complete Learning Spark book is available from O’Reilly in e-book form with the print copy...
Engineering blog

The State of Apache Spark in 2014

July 18, 2014 by Matei Zaharia in Engineering Blog
This post originally appeared in insideBIGDATA and is reposted here with permission. With the second Spark Summit behind us, we wanted to take...
Engineering blog

Making Apache Spark Easier to Use in Java with Java 8

One of Apache Spark’s main goals is to make big data applications easier to write. Spark has always had concise APIs in Scala...
Engineering blog

Apache Spark: A Delight for Developers

This article was cross-posted in the Cloudera developer blog . Apache Spark is well known today for its performance benefits over MapReduce...
Company blog

The Growing Apache Spark Community

October 27, 2013 by Matei Zaharia in Company Blog
This year has seen unprecedented growth in both the user and contributor communities around Apache Spark . This rapid growth validates the tremendous...
Company blog

Databricks and the Apache Spark Platform

October 27, 2013 by Ion Stoica and Matei Zaharia in Company Blog
When we announced that the original team behind Apache Spark is starting a company around the project, we got a lot of excited...