Skip to main content
Page 1
Platform blog

Data Intelligence Platforms

The observation that " software is eating the world " has shaped the modern tech industry. Today, software is ubiquitous in our lives...
Engineering blog

Announcing Delta Lake 3.0 with New Universal Format and Liquid Clustering

We are excited to announce Delta Lake 3.0, the next major release of the Linux Foundation open source Delta Lake Project, available in...
Engineering blog

Project Lightspeed Update - Advancing Apache Spark Structured Streaming

In this blog post, we will review the advancements in Spark Structured Streaming since we announced Project Lightspeed a year ago, from performance...
Platform blog

Introducing Materialized Views and Streaming Tables for Databricks SQL

We are thrilled to announce that materialized views and streaming tables are now publicly available in Databricks SQL on AWS and Azure. Streaming...
Engineering blog

Latency goes subsecond in Apache Spark Structured Streaming

Apache Spark Structured Streaming is the leading open source stream processing platform. It is also the core technology that powers streaming on the...
Company blog

Open Sourcing All of Delta Lake

The theme of this year's Data + AI Summit is that we are building the modern data stack with the lakehouse. A fundamental...
Platform blog

Delta Live Tables Announces New Capabilities and Performance Optimizations

June 29, 2022 by Paul Lappas and Michael Armbrust in Product
Since the availability of Delta Live Tables (DLT) on all clouds in April ( announcement ), we've introduced new features to make development...
Engineering blog

Project Lightspeed: Faster and Simpler Stream Processing With Apache Spark

Streaming data is a critical area of computing today. It is the basis for making quick decisions on the enormous amounts of incoming...
Platform blog

Announcing General Availability of Databricks’ Delta Live Tables (DLT)

Today, we are thrilled to announce that Delta Live Tables (DLT) is generally available (GA) on the Amazon AWS and Microsoft Azure clouds...
Platform blog

Databricks Delta Live Tables Announces Support for Simplified Change Data Capture

​As organizations adopt the data lakehouse architecture, data engineers are looking for efficient ways to capture continually arriving data. Even with the right...
Platform blog

Frequently Asked Questions About the Data Lakehouse

Question Index What is a Data Lakehouse? What is a Data Lake? What is a Data Warehouse? How is a Data Lakehouse different...
Platform blog

Announcing the Launch of Delta Live Tables: Reliable Data Engineering Made Easy

SIGN UP FOR PUBLIC PREVIEW As the amount of data, data sources and data types at organizations grow, building and maintaining reliable data...
Platform blog

Introducing Delta Sharing: An Open Protocol for Secure Data Sharing

Update: Delta Sharing is now generally available on AWS and Azure. Get an early preview of O'Reilly's new ebook for the step-by-step guidance...
Engineering blog

What Is a Lakehouse?

Read Building the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data warehouse...
Company blog

Delta Lake Now Hosted by the Linux Foundation to Become the Open Standard for Data Lakes

October 16, 2019 by Michael Armbrust and Reynold Xin in Company Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. At today’s Spark +...
Company blog

Diving Into Delta Lake: Unpacking The Transaction Log

The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important...
Company blog

How to Work with Avro, Kafka, and Schema Registry in Databricks

February 15, 2019 by Wenchen Fan and Michael Armbrust in Company Blog
In the previous blog post , we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can...
Platform blog

How to Avoid Drowning in GDPR Data Subject Requests in a Data Lake

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. With GDPR enforcement rapidly...
Engineering blog

Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3

Import this notebook on Databricks Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons...
Company blog

Databricks Delta: A Unified Data Management System for Real-time Big Data

Combining the best of data warehouses, data lakes and streaming For an in-depth look and demo, join the webinar . Today we are...
Engineering blog

Introducing Apache Spark 2.2

Today we are happy to announce the availability of Apache Spark 2.2.0 on Databricks as part of the Databricks Runtime 3.0. This release...
Engineering blog

Processing Data in Apache Kafka with Structured Streaming in Apache Spark 2.2

This is the third post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. In this blog...
Engineering blog

Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1

In part 1 of this series on Structured Streaming blog posts, we demonstrated how easy it is to write an end-to-end streaming ETL...
Engineering blog

Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1

Explore why lakehouses are the data architecture of the future with the father of the data warehouse, Bill Inmon. Try this notebook in...
Engineering blog

Introducing Apache Spark Datasets

Developers have always loved Apache Spark for providing APIs that are simple yet powerful, a combination of traits that makes complex analysis possible...
Engineering blog

Apache Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Intervals, and UDAFs

To try new features highlighted in this blog post, download Spark 1.5 or sign up Databricks for a 14-day free trial today...
Engineering blog

Introducing Window Functions in Spark SQL

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. In this blog post...
Engineering blog

Deep Dive into Spark SQL's Catalyst Optimizer

Check out the Why the Data Lakehouse is Your Next Data Warehouse ebook to discover the inner workings of the Databricks Lakehouse Platform...
Engineering blog

What's new for Spark SQL in Apache Spark 1.3

March 24, 2015 by Michael Armbrust in Engineering Blog
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Engineering blog

Introducing DataFrames in Apache Spark for Large Scale Data Science

Today, we are excited to announce a new DataFrame API designed to make big data processing even easier for a wider audience. When...
Engineering blog

Spark SQL Data Sources API: Unified Data Access for the Apache Spark Platform

January 9, 2015 by Michael Armbrust in Engineering Blog
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Engineering blog

Spark SQL: Manipulating Structured Data Using Apache Spark

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...