Session

Let's Save Tons of Money With Cloud-Native Data Ingestion!

Overview

ExperienceIn Person
TypeBreakout
TrackData Lakehouse Architecture and Implementation
IndustryEnterprise Technology, Media and Entertainment
TechnologiesDelta Lake, Apache Iceberg
Skill LevelBeginner

Delta Lake is a fantastic technology for quickly querying massive data sets, but first you need those massive data sets! In this session we will dive into the cloud-native architecture Scribd has adopted to ingest data from AWS Aurora, SQS, Kinesis Data Firehose and more. By using off-the-shelf open source tools like kafka-delta-ingest, oxbow and Airbyte, Scribd has redefined its ingestion architecture to be more event-driven, reliable, and most importantly: cheaper. No jobs needed!

 

Attendees will learn how to use third-party tools in concert with a Databricks and Unity Catalog environment to provide a highly efficient and available data platform. This architecture will be presented in the context of AWS but can be adapted for Azure, Google Cloud Platform or even on-premise environments.

Session Speakers

Tyler Croy

/Valued Employee
Scribd