Open Source | Databricks Blog

Page 3

Using Apache Flink With Delta Lake

February 10, 2022 by Max Fisher, Dylan Gessner and Vini Jaiswal in Open Source

As with all parts of our platform, we are constantly raising the bar and adding new features to enhance developers’ abilities to build...

Make Your Data Lakehouse Run, Faster With Delta Lake 1.1

January 31, 2022 by Scott Sandre, Ryan Zhu, Denny Lee and Vini Jaiswal in Engineering Blog

Delta Lake 1.1 improves performance for merge operations, adds the support for generated columns and improves nested field resolution With the tremendous contributions...

The Ubiquity of Delta Standalone: Java, Scala, Hive, Presto, Trino, Power BI, and More!

January 28, 2022 by Allison Portis, Scott Sandre, Denny Lee, Venki Korukanti and Shixiong Zhu in Engineering Blog

The Delta Standalone library is a single-node Java library that can be used to read from and write to Delta tables. Specifically, this...

Creating a Faster TAR Extractor

January 26, 2022 by Christopher Denny in Engineering Blog

Tarballs are used industry-wide for packaging and distributing files, and this is no different at Databricks. Every day we launch millions of VMs...

Extending Delta Sharing for Azure

January 21, 2022 by Will Girten, Shixiong Zhu and Denny Lee in Engineering Blog

We are excited for the release of Delta Sharing 0.3.0, which introduces several key improvements and bug fixes, including the following features: Delta...

Log4j2 Vulnerability (CVE-2021-44228) Research and Assessment

December 23, 2021 by Fermin J. Serna in Engineering Blog

This blog relates to an ongoing investigation. We will update it with any significant updates, including detection rules to help people investigate potential...

Scala at Scale at Databricks

December 3, 2021 by Li Haoyi in Engineering Blog

With hundreds of developers and millions of lines of code, Databricks is one of the largest Scala shops around. This post will be...

The Foundation of Your Lakehouse Starts With Delta Lake

December 1, 2021 by Denny Lee and Vini Jaiswal in Engineering Blog

It’s been an exciting last few years with the Delta Lake project. The release of Delta Lake 1.0 as announced by Michael Armbrust...

Turning 2 Trillion Data Points of Traffic Intelligence into Critical Business Insights

November 3, 2021 by Stephanie Mak in Engineering Blog

This is a guest authored post by Stephanie Mak , Senior Data Engineer, formerly at Intelematics. This blog post offers my experience of...

Introducing Apache Spark™ 3.2

October 19, 2021 by Gengliang Wang, Wenchen Fan, Hyukjin Kwon, Xiao Li and Reynold Xin in Engineering Blog

We are excited to announce the availability of Apache Spark™ 3.2 on Databricks as part of Databricks Runtime 10.0 . We want to...