Skip to main content

Databricks SQL accelerates customer workloads by 5x in just three years

New features announced today deliver an additional 25% boost automatically

DBSQL Serverless OG

Published: June 12, 2025

Announcements3 min read

Summary

  • 5x performance improvement for real-world customer workloads achieved since 2022
  • New release increases performance by an additional 25%—automatically, with no price change
  • Predictive Query Execution delivers faster queries with a continuous feedback loop inside the query engine
  • Photon Vectorized Shuffle delivers 1.5x higher shuffle throughput

Since 2022, Databricks SQL (DBSQL) Serverless has delivered a 5x performance gain across real-world customer workloads—turning a 100-second dashboard into a 20-second one. That acceleration came from continuous engine improvements, all delivered automatically and without performance tuning.

5x performance increase DBSQL Serverless

Today, we’re adding even more. With the launch of Predictive Query Execution and Photon Vectorized Shuffle, queries get up to 25% faster on top of the existing 5x gains, bringing that 20-second dashboard down to around 15 seconds. These new engine improvements roll out automatically across all DBSQL Serverless warehouses, at zero additional cost

Performance improvements of 25 percent

Predictive Query Execution: From reactive recovery to real-time control

When it launched in Apache Spark, Adaptive Query Execution (AQE) was a big step forward. It allowed queries to re-plan based on actual data sizes as the query was executed. However, it had one major limitation: it could only act after a query execution stage was completed. That delay meant problems like data skew or excessive spilling often weren’t caught until it was too late.

Predictive Query Execution (PQE) changes that. It introduces a continuous feedback loop inside the query engine:

  • It monitors running tasks in real time, collecting metrics like spill size and CPU usage.
  • It decides whether to intervene with a lightweight, intelligent system.
  • If needed, PQE cancels and replans the stage on the spot, avoiding wasted work and improving stability.

performance improvements graphic

The result? Faster queries, fewer surprises, and more predictable performance—especially for complex pipelines and mixed workloads

Photon Vectorized Shuffle: Faster queries, smarter design

Photon is a native C++ engine that processes data in columnar batches, vectorized to leverage modern CPUs and execute SQL queries several times faster. Shuffle operations, which restructure large datasets between stages, remain among the heaviest in query processing. 

Shuffle operations historically are the hardest type to optimize because they involve lots of random memory access. It’s also rarely possible to reduce the number of random accesses without rewriting the data. The key intuition that we had was that instead of reducing the number of random accesses, we could reduce the distance between each random access in memory. 

This led to us rewriting Photon's shuffle from the ground up with column-based Shuffle for higher cache and memory efficiency. 

The result is a shuffle component that moves data efficiently, executes fewer instructions, and considers cache. With the newly optimized shuffle, we see 1.5× higher throughput in CPU-bound workloads like large joins.

 Key takeaways

  • Get up to 25% faster queries—automatically.
    Internal TPC-DS benchmarks and real customer workloads show consistent latency improvements, with no tuning required.
  • No configuration, no redeploy—just results.
    The upgrades are rolling out now across DBSQL Serverless warehouses. You don’t have to change a single setting.
  • Biggest wins on CPU-bound workloads.
    Pipelines with heavy joins or funnel logic see the most dramatic improvements, often cutting minutes off total runtime

Getting started

This upgrade is rolling out now across all DBSQL Serverless warehouses—no action needed.

Haven’t tried DBSQL Serverless yet? Now’s the perfect time. Serverless is the easiest way to run analytics on the Lakehouse:

  • No infrastructure to manage
  • Instantly elastic
  • Optimized for performance out of the box

Just create a DBSQL Serverless warehouse and start querying—zero tuning required. If you are not already using Databricks SQL, read more on enabling serverless SQL warehouses

Never miss a Databricks post

Subscribe to the categories you care about and get the latest posts delivered to your inbox