HomepageData + AI Summit 2022 Logo
Watch on demand

Moving to the Lakehouse: Fast & Efficient Ingestion with Auto Loader

On Demand

Type

  • Session

Format

  • Virtual

Track

  • Data Lakes, Data Warehouses and Data Lakehouses

Difficulty

  • Intermediate

Duration

  • 0 min

Überblick

Auto loader, the most popular tool for incremental data ingestion from cloud storage to Databricks’ Lakehouse, is used in our biggest customers’ ingestion workflows. Auto Loader is our all-in-one solution for exactly-once processing offering efficient file discovery, schema inference and evolution, and fault tolerance.



In this talk, we want to delve into key features in Auto Loader, including:

• Avro schema inference

• Rescued column

• Semi-structured data support

• Incremental listing

• Asynchronous backfilling

• Native listing

• File-level tracking and observability



Auto Loader is also used in other Databricks features such as Delta Live Tables. We will discuss the architecture, provide a demo, and feature an Auto Loader customer speaking about their experience migrating to Auto Loader.

Session Speakers

Benyue Liu

Databricks

Eric Maynard

Databricks

Das Beste des Data+AI Summits anzeigen

Watch on demand