Ganesh Chand is a data engineering consultant at Databricks with 10+ years of industry experience in building enterprise-scale Data solutions. He is particularly passionate about solving world’s toughest data engineering problems. At Databricks, he is busy tackling some of the toughest data engineering projects for Databricks customers. Outside of Databricks, he manages and runs Kathmandu Apache Spark meetup group and has given numerous presentations and workshops on Apache Spark and functional programming using Scala.
May 27, 2021 05:00 PM PT
Developing and deploying data pipelines in production is easy. Maintaining data pipelines is hard because most often it's not the same engineer or team responsible for operating and maintaining data pipelines in production. If your data pipelines are not parameterized and configurable, you need to recompile your source code and go through your release process even for simple configuration changes. Making your data pipelines configurable is not enough. Bad user input can result in many classes of issues such as data loss, data corruption. data correctness, etc.
In this talk, you'll walk away with techniques to make your data pipelines dumb-proof.
1. Why do you need to make your data pipelines configurable?
2. How to seamlessly promote your data pipelines from one environment to another without making any source code changes?
3. How to reconfigure your data pipelines in production without recompiling the ETL source code?
4. What are the Pros and Cons of using Databricks Notebook widgets for configuring your data pipelines
5. How to externalize configurations from your ETL source code and how to read and parse configuration files
6. Finally, you'll learn how to take it to next level by leveraging Scala language features, pure config, and typesafe config libraries to achieve boilerplate free configuration code and configuration validations
October 15, 2019 05:00 PM PT
Why to build your own analytics application on top on Delta lake : - Every enterprise is building a data lake. However, these data lakes are plagued by low user adoption, poor data quality, and result in lower ROI. - BI tools may not be enough for your use case, especially, when you want to build a data driven analytical web application such as paysa. - Delta's ACID guarantees allows you to build a real-time reporting app that displays consistent and reliable data
In this talk we will learn :