Session

Best Practices for Disaster Recovery and Resilience on Databricks

Overview

ExperienceIn Person
TrackGovernance & Security
IndustryEnterprise Technology
TechnologiesDatabricks SQL, Unity Catalog
Skill LevelBeginner

When a region goes down, every minute counts. But building disaster recovery for a lakehouse is often complex, requiring coordination across data, metadata, and workspace assets.

In this session, we’ll share best practices for designing resilient data and AI platforms on Databricks. Learn how to think about recovery objectives, including RPO and RTO, and how to architect for cross-region availability across your data, governance layer, and production workloads.

We’ll walk through common patterns for replication, failover, and recovery, along with tradeoffs between cost, complexity, and performance. You’ll also see how to reduce operational overhead, avoid brittle DIY approaches, and ensure applications can recover quickly when disruptions occur.

Whether you’re supporting mission-critical analytics or production AI systems, walk away with practical guidance to design, test, and operate disaster recovery with confidence.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Sirui Sun

/Sr. Manager, Product Management
Databricks

Bart Samwel

/Principal Software Engineer
Databricks