Platform blog

Reduce Time to Decision With the Databricks Lakehouse Platform and Latest Intel 3rd Gen Xeon Scalable Processors

Up to 3.0x price/performance benefits and 6.7x the speed up1
Share this post

This is a collaborative post from Databricks and Intel. We thank Swastik Chakroborty,Regional Technical Sales Director-APJ, and Lakshman Chari, Cloud ISV Partner Manager, of Intel, for their contributions.

 
The Databricks Lakehouse Platform unifies the best of data lake’s openness, scalability and flexibility with the best of data warehouse’s reliability, governance and performance. In this blog, we will look at performance aspects using Databricks Photon, which uses the latest techniques in vectorized query processing, and the latest Intel 3rd Gen Xeon scalable processors, which includes Intel Advanced Vector Extensions 512 (Intel® AVX-512).

Before we dive into the numbers, and the price/performance improvements, let’s take a moment to consider why these performance improvements are important. Consider this: as the volume of your data grows, and the requirement to deliver insights and take decisions quickly becomes important as a competitive advantage, the need to quickly process your data grows even faster. While optimizing and refactoring queries or code could help speed up workloads, analysts should focus on functional intent and business questions rather than query optimization. How do you ensure that results improve over time?

When you choose the Databricks Lakehouse Platform, you are choosing a platform that, together with our partners, consistently pushes and delivers improvements to help deliver the best value to our customers.

To examine these benefits in action, we ran a test derived from the industry-standard TPC-DS power test2. We examined the results3 before and after enabling Photon and then switching to use latest Intel 3rd Gen Xeon Scalable processors:

Photon is the native vectorized query engine on Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. When you enable Photon, your existing code and queries can take advantage of the latest techniques in vectorized query processing to capitalize on data - and instruction-level parallelism in CPUs. This allows Photon customers to get a lower TCO and faster SLA for ETL and interactive queries.

Intel 3rd Gen Xeon Scalable processor includes Intel’s latest generation of Single Instruction Multiple Data (SIMD) instruction set, Intel® AVX-512, which boosts performance and throughput for the most demanding computational tasks such as data analytics and machine learning.

Establishing a baseline

For the baseline, we are using Azure’s E8ds_v3 virtual machines, which have Intel 1st Gen Xeon Scalable processors, and Databricks runtime (DBR) 10.3 without Photon enabled. We ran TPC-DS benchmarks during March 2022 at both 1TB and 10TB scales on 20 worker clusters sizes.

20 x E8ds_v3 ( Intel 1st Gen Xeon Scalable processors) workers, DBR 10.3 without Photon enabled

TPC-DS at 1TB TPC-DS at 10TB
Time (s) 2,265 15,324
Total cost
(Databricks Premium + VM costs)
$14 $98

The Photon effect

We then ran the same workload without any code changes on the same machines with Photon enabled.

20 x E8ds_v3 ( Intel 1st Gen Xeon Scalable processors) workers, DBR 10.3 with Photon enabled

TPC-DS at 1TB TPC-DS at 10TB
Time (s) 645 4,482
Total cost
(Databricks Premium + VM costs)
$7 $52

That’s already yielded a 1.9x price-performance increase and a 3.4x performance speedup compared to the baseline.

Unleashing the full potential with Photon and Intel 3rd Gen Xeon Scalable processors

Again the same workload without any code changes, but this time using Azure’s E8_ds_v5 virtual machines, with Intel 3rd Gen Xeon Scalable processors, and Photon enabled

20 x E8ds_v5 (Intel 3rd Gen Xeon Scalable processors) workers, DBR 10.3 with Photon enabled

  TPC-DS at 1TB TPC-DS at 10TB
Time (s) 334 2,271
Total cost
(Databricks Premium + VM costs)
$4.78 $32.47

That’s a 3x price-performance increase and a 6.7x performance speedup compared to our baseline.

Time for some graphs

Putting it all together

By enabling Databricks Photon and using Intel's 3rd Gen Xeon Scalable processors, without making any code modifications, we were able to save ⅔ of the costs on our TPC-DS benchmark at 10TB and run 6.7 times quicker. This translates not only to cost savings but also reduced time-to-insight.

Learn more at

databricks.com/lakehouse
databricks.com/photon
intel.com/xeonscalable
intel.com/avx512


Footnotes

1 3.0x price/performance benefits and 6.7x the speed up - compared to the same TPC-DS 10TB benchmark with Intel 1st Gen Xeon processors with DBR 10.3 and without Photon enabled.

2 Derived from the power test consisting of all 99 TPC-DS queries ran in sequential order within a single stream.

3 The results shown are not comparable to an official, audited TPC benchmark.

Try Databricks for free

Related posts

See all Platform Blog posts