From Snowflake to Enterprise-Scale Apache Spark™
Overview
Akamai mPulse is a real user monitoring (RUM) solution that delivers real-time web performance analytics to Akamai customers through dashboards, alerting, reporting and data science. The architecture of mPulse relies on a combination of public and private cloud-based services, such as Amazon AWS, Microsoft Azure and the Snowflake data warehouse. Snowflake has provided the core data warehousing needs as the product has grown at scale along with Akamai’s customers.
The engineering team at mPulse has been re-architecting the system to migrate away from Snowflake to an internal enterprise-scale Apache Spark™ solution that Akamai has been developing in-house to improve performance and save on cost. In the first half of the talk, we’ll discuss how the mPulse team made the decision to migrate, the challenges we’ve seen and how Spark is suiting the product's needs.
In the second half of the talk, we’ll discuss the details of the Spark-based infrastructure. Akamai data warehouse (aka Asgard) is a Spark-based solution running on the Azure cloud. We will describe the internal and unique technologies and characteristics of the solution that enable it to outperform Snowflake's offering both from a cost and performance perspective. We will share our experience on how to:
- Run Spark on K8s at scale while supporting multi-tenancy and resource isolation
- Handle hundreds of queries per second on a single Spark application with sub-second query latency
- Protect Spark application from misbehaving users
- Optimize SQL-based queries
Type
- Lightning Talk
Experience
- In Person
Track
- Data Engineering
Industry
- Enterprise Technology, Media and Entertainment
Difficulty
- Intermediate
Duration
- 20 min
Don't miss this year's event!
Register now