Best Practices for Disaster Recovery and Resilience on Databricks
Overview
| Experience | In Person |
|---|---|
| Track | Governance & Security |
| Industry | Enterprise Technology |
| Technologies | Databricks SQL, Unity Catalog |
| Skill Level | Beginner |
When a region goes down, every minute counts. But building disaster recovery for a lakehouse is often complex, requiring coordination across data, metadata, and workspace assets.
In this session, we’ll share best practices for designing resilient data and AI platforms on Databricks. Learn how to think about recovery objectives, including RPO and RTO, and how to architect for cross-region availability across your data, governance layer, and production workloads.
We’ll walk through common patterns for replication, failover, and recovery, along with tradeoffs between cost, complexity, and performance. You’ll also see how to reduce operational overhead, avoid brittle DIY approaches, and ensure applications can recover quickly when disruptions occur.
Whether you’re supporting mission-critical analytics or production AI systems, walk away with practical guidance to design, test, and operate disaster recovery with confidence.
Session Speakers
Sirui Sun
/Sr. Manager, Product Management
Databricks
Bart Samwel
/Principal Software Engineer
Databricks