ホームData + AI Summit 2022 のロゴ
Watch on demand

Serverless Kafka and Apache Spark in a Multi-Cloud Data Lakehouse Architecture

On Demand

Type

  • Session

フォーマット

  • Hybrid

Track

  • データレイク、データウェアハウス、データレイクハウス

Difficulty

  • Intermediate

Room

  •  Moscone South | Level 2 | 215

Duration

  • 35 min
Download session slides

概要

Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.



This post explores different architecture to build serverless Kafka and Spark multi-cloud architectures across regions and continents. We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data lakehouse. Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.

Session Speakers

Kai Waehner

Field CTO

Confluent

Data+AI サミットの様子をご覧いただけます

Watch on demand