HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
SAN FRANCISCO + VIRTUAL
Attend Live

Moving to the Lakehouse: Fast & Efficient Ingestion with Auto Loader

On Demand

Type

  • Session

Format

  • Virtual

Track

  • Data Lakes, Data Warehouses and Data Lakehouses

Difficulty

  • Intermediate

Duration

  • 0 min

Overview

Auto loader, the most popular tool for incremental data ingestion from cloud storage to Databricks’ Lakehouse, is used in our biggest customers’ ingestion workflows. Auto Loader is our all-in-one solution for exactly-once processing offering efficient file discovery, schema inference and evolution, and fault tolerance.



In this talk, we want to delve into key features in Auto Loader, including:

• Avro schema inference

• Rescued column

• Semi-structured data support

• Incremental listing

• Asynchronous backfilling

• Native listing

• File-level tracking and observability



Auto Loader is also used in other Databricks features such as Delta Live Tables. We will discuss the architecture, provide a demo, and feature an Auto Loader customer speaking about their experience migrating to Auto Loader.

Session Speakers

Headshot of Benyue Liu

Benyue Liu

Databricks

Headshot of Eric Maynard

Eric Maynard

Databricks

See the best of Data+AI Summit

Watch on demand