Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh
- Data Engineering
- Moscone South | Upper Mezzanine | 155
- 35 min
We all live in the exciting times and the hype of Distributed Data Mesh (or just mess). This talk will talk about a couple architectural and organizational approaches on achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated departamental or team collaborative environments, and security enforcement.
As a Data Leader, you’ll learn what kinds of things you’d need to pay attention to, when starting (or reviving) a modern Data Engineering and Data Science strategy and how Databricks Unity Catalog may help you automating that. Interacting with infrastructure engineers may take a chunk of time initially and you’ll learn to make it smooth.
As DevOps, you’ll learn about the best practices and pitfalls of Continuous Deployment on Databricks With Terraform and Continuous Integration with Databricks Repos. You’ll be excited how you can automate Data Security with Unity Catalog and Terraform. You’ll hear practical tips and tricks to structure development and production environments.
As a Data Scientist, you’ll learn how you can get relevant infrastructure into “production” relatively faster. You’ll be excited to see a striking similarity between Infrastructure as a Code and Spark DataFrames. You’ll learn just enough Terraform to declaratively codify your desired Databricks environments.
This talk will be all sprinkled with diagrams and Terraform code snippets to keep you entertained.