ホームData + AI Summit 2022 のロゴ
Watch on demand

Interactive Analytics on a Massive Scale Using Delta Lake

On Demand

Type

  • Session

フォーマット

  • Hybrid

Track

  • データレイク、データウェアハウス、データレイクハウス

Difficulty

  • Advanced

Room

  • Moscone South | Level 2 | 202

Duration

  • 35 min

概要

At Akamai, we make the internet fast, reliable and secure. As the industry leader and largest CDN provider, we serve traffic for some of the world’s largest enterprises which as a result incurs major challenges around data ingestion and analytics. In this talk, we will present the challenges we faced while building a delta lake of all the security events that were occurring on the Akamai network and how these were tackled.
The major challenge we will focus on is how we were able to improve the latency of the interactive queries, even when those that were scanning huge amounts of data, down to a just few seconds and sometimes even a sub-second. Specifically, we will focus on three areas for improvement that we identified as a bottleneck:
Delta Log Scan - every query against a Delta table starts with scanning the Delta log. Being able to reduce the overhead of this to tens of milliseconds was one key for success.
Caching - one the best ways to allow a fast response time is utilizing the Delta Cache. We will show what steps were taken to drastically improve the cache utilization and cache hit.
Storage - Cloud storages usually have a bandwidth limit on the egress rate. Reading “too much” data in a short period of time will usually result in throttling errors causing a slower response time overall. We will share how to spot these and the steps taken to reduce them.

Session Speakers

Hagai Attias

Senior Software Architect

Akamai Technologies

Data+AI サミットの様子をご覧いただけます

Watch on demand