Moving to the Lakehouse: Fast & Efficient Ingestion with Auto Loader
On Demand
Type
- Session
Format
- Virtual
Track
- Data Lakes, Data Warehouses and Data Lakehouses
Difficulty
- Intermediate
Duration
- 0 min
Overview
Auto loader, the most popular tool for incremental data ingestion from cloud storage to Databricks’ Lakehouse, is used in our biggest customers’ ingestion workflows. Auto Loader is our all-in-one solution for exactly-once processing offering efficient file discovery, schema inference and evolution, and fault tolerance.
In this talk, we want to delve into key features in Auto Loader, including:
• Avro schema inference
• Rescued column
• Semi-structured data support
• Incremental listing
• Asynchronous backfilling
• Native listing
• File-level tracking and observability
Auto Loader is also used in other Databricks features such as Delta Live Tables. We will discuss the architecture, provide a demo, and feature an Auto Loader customer speaking about their experience migrating to Auto Loader.
In this talk, we want to delve into key features in Auto Loader, including:
• Avro schema inference
• Rescued column
• Semi-structured data support
• Incremental listing
• Asynchronous backfilling
• Native listing
• File-level tracking and observability
Auto Loader is also used in other Databricks features such as Delta Live Tables. We will discuss the architecture, provide a demo, and feature an Auto Loader customer speaking about their experience migrating to Auto Loader.
Session Speakers
See the best of Data+AI Summit
Watch on demand