Skip to main content

Databricks Lakehouse Sets the New World Record for Data Warehouse Performance

For the first time, an open data lake delivered better performance than traditional data warehouses, making the lakehouse vision a reality.

November 2, 2021
Share this post

SAN FRANCISCO - October 6, 2021 - Databricks, the Data and AI company, today announced that it has set a world record for the official 100 terabyte TPC-DS benchmark, the gold standard to evaluate the performance of data warehouse systems. Unlike most other benchmark results, the results were audited and made public by the official Transaction Processing Performance Council (TPC) that organizes TPC-DS. According to the council, Databricks outperformed the previous world record holder by 2.2x. A separate research study conducted by Barcelona Supercomputing Center (BSC) compared Databricks and Snowflake and found that Databricks was 2.7X faster and more than an order of magnitude cheaper on the same workload. Read Databricks blog on how the team built the engine that could achieve these results.

For the first time, Databricks has shown that the data lakehouse architecture built on top of vast amounts of data stored in open data lakes can deliver better data warehousing performance than traditional data warehouses using proprietary data formats. This is a major validation for the lakehouse paradigm and helps prove why the data warehouse as we know it today will either cease to exist or look vastly different in the coming decade.

Traditionally organizations have maintained two separate data stacks – data lake for data science and machine learning, and data warehouse for BI and SQL analytics. This has led to cost overruns, data duplication, and governance issues. To avoid these issues, increasingly organizations are pointing BI tools directly to the data lake to power their analytics, as most of the data in the enterprise is already in the data lake. But the performance on data lakes has not been on par with the expectations of the analyst and business community.

Databricks has been rapidly developing full blown data warehousing capabilities directly on data lakes, bringing the best of both worlds in one data architecture dubbed the data lakehouse. We announced our full suite of data warehousing capabilities as Databricks SQL in November 2020. The open question since then has been whether an open architecture based on the Lakehouse can provide the performance, speed, and cost of the classic data warehouses. This result proves beyond any doubt that this is possible and achieved by the Lakehouse architecture.

For the key innovations that enabled this new record, including Delta Lake, Photon engine, and ML-based optimizations, visit our blog.

About Databricks 

Databricks is the data and AI company. More than 5,000 organizations worldwide — including Comcast, Condé Nast, H&M, and over 40% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.

Press Contact: 

[email protected]

Recent Press Releases

Databricks Launches Data Intelligence Platform for Energy, Bringing Generative AI Capabilities to the Energy Sector
Read Now
Databricks Sees Over 70% Annual Growth in the ANZ Market as Enterprise AI Booms
Read Now
Databricks Launches DBRX, A New Standard for Efficient Open Source Models
Read Now
Databricks Strengthens Presence in Latin America, Appointing Marcos Grilanda as Vice President and General Manager
Read Now
Databricks Doubles Down on Investment in India Amidst Local Enterprise AI Boom
Read Now
View All

Resources

Contact

For press inquires:

[email protected]

Stay connected

Stay up to date and connect with us through our newsletter, social media channels and blog RSS feed.
Subscribe to the newsletter

Get assets

If you would like to use Databricks materials, please contact [email protected] and provide the following information:

Your name and title
Company name and location
Description of request
View brand guidelines