HomepageData + AI Summit 2022 Logo
Watch on demand

Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Engineering

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 155

Duration

  • 35 min
Download session slides

Overview

We all live in the exciting times and the hype of Distributed Data Mesh (or just mess). This talk will talk about a couple architectural and organizational approaches on achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated departamental or team collaborative environments, and security enforcement.

As a Data Leader, you’ll learn what kinds of things you’d need to pay attention to, when starting (or reviving) a modern Data Engineering and Data Science strategy and how Databricks Unity Catalog may help you automating that. Interacting with infrastructure engineers may take a chunk of time initially and you’ll learn to make it smooth.

As DevOps, you’ll learn about the best practices and pitfalls of Continuous Deployment on Databricks With Terraform and Continuous Integration with Databricks Repos. You’ll be excited how you can automate Data Security with Unity Catalog and Terraform. You’ll hear practical tips and tricks to structure development and production environments.

As a Data Scientist, you’ll learn how you can get relevant infrastructure into “production” relatively faster. You’ll be excited to see a striking similarity between Infrastructure as a Code and Spark DataFrames. You’ll learn just enough Terraform to declaratively codify your desired Databricks environments.

This talk will be all sprinkled with diagrams and Terraform code snippets to keep you entertained.

Session Speakers

Serge Smertin

Databricks

See the best of Data+AI Summit

Watch on demand