Skip to main content
Page 1
Company blog

Congratulations to Summer Hackathon Winners!

September 22, 2023 by Karen Bajza and Denny Lee in Company Blog
Earlier this year, Databricks launched Dolly 2.0: the world's first truly open instruction-tuned Large Language Model (LLM) . To build off this excitement...
Engineering blog

Introducing English as the New Programming Language for Apache Spark

Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™...
Engineering blog

Announcing Delta Lake 3.0 with New Universal Format and Liquid Clustering

We are excited to announce Delta Lake 3.0, the next major release of the Linux Foundation open source Delta Lake Project, available in...
Company blog

Defining the Future of Data & AI: Announcing the Finalists for the 2022 Databricks Data Team OSS Award

June 15, 2022 by Denny Lee in Customers
The annual Databricks Data Team Awards recognize data teams who are harnessing the power of data and AI to deliver solutions for some...
Engineering blog

Extending Delta Sharing to Google Cloud Storage

This blog article has been cross-posted from the Delta.io blog . We are excited for the release of Delta Sharing 0.4.0 for the...
Engineering blog

Make Your Data Lakehouse Run, Faster With Delta Lake 1.1

Delta Lake 1.1 improves performance for merge operations, adds the support for generated columns and improves nested field resolution With the tremendous contributions...
Engineering blog

The Ubiquity of Delta Standalone: Java, Scala, Hive, Presto, Trino, Power BI, and More!

The Delta Standalone library is a single-node Java library that can be used to read from and write to Delta tables. Specifically, this...
Engineering blog

Extending Delta Sharing for Azure

We are excited for the release of Delta Sharing 0.3.0, which introduces several key improvements and bug fixes, including the following features: Delta...
Engineering blog

The Foundation of Your Lakehouse Starts With Delta Lake

December 1, 2021 by Denny Lee and Vini Jaiswal in Engineering Blog
It’s been an exciting last few years with the Delta Lake project. The release of Delta Lake 1.0 as announced by Michael Armbrust...
Engineering blog

Get Your Free Copy of Delta Lake: The Definitive Guide (Early Release)

At the Data + AI Summit, we were thrilled to announce the early release of Delta Lake: The Definitive Guide , published by...
Data AI

How We Launched a Podcast: Lessons, (Minor) Mishaps & Key Takeaways

April 23, 2021 by Brooke Wenig and Denny Lee in Data Strategy
After six episodes featuring amazing leaders and practitioners in the data and AI community, we wrapped up season 1 of Data Brew by...
Engineering blog

Attack of the Delta Clones (Against Disaster Recovery Availability Complexity)

April 20, 2021 by Itai Weiss and Denny Lee in Engineering Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Notebook: Using Deep Clone...
Engineering blog

Automatically Evolve Your Nested Column Schema, Stream From a Delta Table Version, and Check Your Constraints

We recently announced the release of Delta Lake 0.8.0 , which introduces schema evolution and performance improvements in merge and operational metrics in...
Engineering blog

Natively Query Your Delta Lake With Scala, Java, and Python

Today, we’re happy to announce that you can natively query your Delta Lake with Scala and Java (via the Delta Standalone Reader )...
Company blog

How Scribd Uses Delta Lake to Enable the World's Largest Digital Library

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Scribd uses Delta Lake...
Engineering blog

Enabling Spark SQL DDL and DML in Delta Lake on Apache Spark 3.0

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Last week, we had...
Engineering blog

Time Traveling with Delta Lake: A Retrospective of the Last Year

June 18, 2020 by Burak Yavuz and Denny Lee in Engineering Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try out Delta Lake...
Engineering blog

Schema Evolution in Merge Operations and Operational Metrics in Delta Lake

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try this notebook to...
Engineering blog

COVID-19 Datasets Now Available on Databricks: How the Data Community Can Help

April 14, 2020 by Denny Lee in Engineering Blog
Initially published April 14th, 2020; updated April 21st, 2020 With the massive disruption of the current COVID-19 pandemic, many data engineers and data...
Engineering blog

Query Delta Lake Tables from Presto and Athena, Improved Operations Concurrency, and Merge performance

January 29, 2020 by Tathagata Das and Denny Lee in Engineering Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. We are excited to...
Engineering blog

Using AutoML Toolkit's FamilyRunner Pipeline APIs to Simplify and Automate Loan Default Predictions

November 5, 2019 by Jas Bali and Denny Lee in Engineering Blog
Try this Loan Risk with AutoML Pipeline API Notebook in Databricks Introduction In the post Using AutoML Toolkit to Automate Loan Default Predictions...
Engineering blog

Simple, Reliable Upserts and Deletes on Delta Lake Tables using Python APIs

October 3, 2019 by Tathagata Das and Denny Lee in Engineering Blog
We are excited to announce the release of Delta Lake 0.4.0 which introduces Python APIs for manipulating and managing data in Delta tables...
Company blog

Diving Into Delta Lake: Schema Enforcement & Evolution

September 24, 2019 by Burak Yavuz, Brenner Heintz and Denny Lee in Company Blog
Try this notebook series in Databricks Data, like our experiences, is always evolving and accumulating. To keep up, our mental models of the...
Company blog

Using AutoML Toolkit to Automate Loan Default Predictions

September 10, 2019 by Benjamin Wilson, Amy Wang and Denny Lee in Company Blog
Download the following notebooks and try the AutoML Toolkit today: Evaluating Risk for Loan Approvals using XGBoost (0.90) | Using AutoML Toolkit to...
Engineering blog

Productionizing Machine Learning with Delta Lake

August 14, 2019 by Brenner Heintz and Denny Lee in Engineering Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try out this notebook...
Company blog

Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark: On-Demand Webinar and FAQ Now Available!

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. On June 13th, we...
Company blog

Detecting Financial Fraud at Scale with Decision Trees and MLflow on Databricks

Try this notebook in Databricks Detecting fraudulent patterns at scale using artificial intelligence is a challenge, no matter the use case. The massive...
Engineering blog

Applying your Convolutional Neural Network: On-Demand Webinar and FAQ Now Available!

November 13, 2018 by Denny Lee and Cyrielle Simeone in Engineering Blog
Try this notebook in Databricks On October 25th, we hosted a live webinar— Applying your Convolutional Neural Network —with Denny Lee, Technical Product...
Engineering blog

Simplifying Change Data Capture with Databricks Delta

October 29, 2018 by Ameet Kini and Denny Lee in Engineering Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake Note: We also recommend...
Engineering blog

Training your Neural Network: On-Demand Webinar and FAQ Now Available!

October 22, 2018 by Denny Lee and Cyrielle Simeone in Engineering Blog
Try this notebook in Databricks On October 9th, we hosted a live webinar— Training your Neural Network —on Data Science Central with Denny...
Company blog

MLflow v0.7.0 Features New R API by RStudio

Today, we’re excited to announce MLflow v0.7.0 , released with new features, including a new MLflow R client API contributed by RStudio...
Engineering blog

Introduction to Neural Networks: On-Demand Webinar and FAQ Now Available!

October 1, 2018 by Denny Lee and Cyrielle Simeone in Engineering Blog
Try this notebook in Databricks On September 27th, we hosted a live webinar— Introduction to Neural Networks —with Denny Lee, Technical Product Marketing...
Engineering blog

Simplify Market Basket Analysis using FP-growth on Databricks

September 18, 2018 by Bhavin Kukadia and Denny Lee in Engineering Blog
When providing recommendations to shoppers on what to purchase, you are often looking for items that are frequently purchased together (e.g. peanut butter...
Company blog

Identify Suspicious Behavior in Video with Databricks Runtime for Machine Learning

September 13, 2018 by Raela Wang and Denny Lee in Company Blog
With the exponential growth of cameras and visual recordings, it is becoming increasingly important to operationalize and automate the process of video identification...
Platform blog

MLflow On-Demand Webinar and FAQ Now Available!

September 12, 2018 by Matei Zaharia and Denny Lee in Product
On August 30th, our team hosted a live webinar— Introducing MLflow: Infrastructure for a complete Machine Learning lifecycle —with Matei Zaharia, Co-Founder and...
Company blog

Building a Real-Time Attribution Pipeline with Databricks Delta

August 9, 2018 by Caryl Yuhas and Denny Lee in Company Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. In digital advertising, one...
Company blog

Loan Risk Analysis with XGBoost and Databricks Runtime for Machine Learning

August 9, 2018 by Amy Wang and Denny Lee in Company Blog
Try this notebook series in Databricks For companies that make money off of interest on loans held by their customer, it’s always about...
Company blog

MLflow 0.4.2 Released

August 8, 2018 by Aaron Davidson and Denny Lee in Company Blog
Today, we’re excited to announce MLflow v0.4.0, MLflow v0.4.1, and v0.4.2 which we released within the last week with some of the recently...
Platform blog

Simplify Advertising Analytics Click Prediction with Databricks Unified Analytics Platform

July 19, 2018 by Tony Cruz and Denny Lee in Product
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Platform blog

Simplify Streaming Stock Data Analysis Using Databricks Delta

July 19, 2018 by John O'Dwyer and Denny Lee in Product
Traditionally, real-time analysis of stock data was a complicated endeavor due to the complexities of maintaining a streaming system and ensuring transactional consistency...
Engineering blog

Make Your Oil and Gas Assets Smarter by Implementing Predictive Maintenance with Databricks

July 19, 2018 by Don Hillborn and Denny Lee in Engineering Blog
How to build an end-to-end predictive data pipeline with Databricks Delta and Spark Streaming Maintaining assets such as compressors is an extremely complex...
Platform blog

Analyze Games from European Soccer Leagues with Apache Spark and Databricks

July 9, 2018 by Abhinav Garg and Denny Lee in Product
Try this notebook series in Databricks Introduction The global sports market is huge, comprised of players, teams, leagues, fan clubs, sponsors, etc., and...
Platform blog

Build a Mobile Gaming Events Data Pipeline with Databricks Delta

July 2, 2018 by Steven Yu and Denny Lee in Product
How to build an end-to-end data pipeline with Structured Streaming Try this notebook in Databricks The world of mobile gaming is fast paced...
Platform blog

Announcing RStudio and Databricks Integration

At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics Platform. You can try it out now...
Company blog

Introducing Getting Started with Apache Spark on Databricks

June 30, 2016 by Jules Damji and Denny Lee in Company Blog
We are proud to introduce the Getting Started with Apache Spark on Databricks Guide . This step-by-step guide illustrates how to leverage the...
Engineering blog

Apache Spark Key Terms, Explained

June 22, 2016 by Jules Damji and Denny Lee in Engineering Blog
This article was originally posted on KDnuggets The Spark Summit Europe call for presentations is open, submit your idea today As observed in...
Company blog

Another Record-Setting Spark Summit

The lure of San Francisco is indisputable as is its position as the preeminent high-tech hub. On day one of Spark Summit 2016...
Engineering blog

On-Time Flight Performance with GraphFrames for Apache Spark

Introduction Graph structures are a more intuitive approach to many classes of data problems. Whether traversing social networks, restaurant recommendations, or flight paths...
Company blog

Findify’s Smart Search Gets Smarter with Apache Spark MLlib and Databricks

February 12, 2016 by Denny Lee in Company Blog
Spark Summit East is just around the corner! If you haven’t registered yet, you can get tickets here with this promo code for...
Company blog

An Illustrated Guide to Advertising Analytics

February 2, 2016 by Grega Kešpret and Denny Lee in Company Blog
To learn the latest developments in Apache Spark, register today to join the Spark community at Spark Summit in New York City! This...
Company blog

Spark Summit East 2016 Agenda is now available

January 13, 2016 by Scott Walent and Denny Lee in Company Blog
This February, join the Apache Spark community in New York City at the New York Midtown Hilton for the second annual Spark Summit...
Company blog

Databricks 2015 Year In Review: Democratizing Access to Data

To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . 2015 has been a phenomenal year...
Company blog

Databricks launches Meetup-in-a-box for Apache Spark Meetup Organizers

One of the most important reasons for the growth of Apache Spark is the amazing grassroots community interest and support to share, teach...
Company blog

Spark Survey 2015 Results are now available

September 24, 2015 by Matei Zaharia, Patrick Wendell and Denny Lee in Company Blog
We ran the Spark Survey 2015 this summer to gain insights on how organizations are using Apache Spark. The results of this year’s...
Company blog

Spark Summit Europe Full Agenda Available Online

August 31, 2015 by Scott Walent and Denny Lee in Company Blog
This October, join the Apache Spark community in Amsterdam at the Beurs Van Berlage for the very first Spark Summit in Europe! We...
Company blog

Announcing SparkHub: A Community Site for Apache Spark

July 10, 2015 by Denny Lee in Company Blog
Today, we are happy to announce SparkHub , a service for the Apache Spark community to easily find the most relevant Spark resources...
Company blog

Databricks and IBM Collaborate to Enhance Apache Spark Machine Learning

June 15, 2015 by Denny Lee in Company Blog
At today’s Spark Summit , Databricks and IBM announced a joint effort to contribute key machine learning capabilities to the Apache Spark Project...
Company blog

Simplify Machine Learning on Apache Spark with Databricks

June 4, 2015 by Denny Lee in Company Blog
As many data scientists and engineers can attest, the majority of the time is spent not on the models themselves but on the...