Enabling BI in a Lakehouse Environment: How Spark and Delta Can Help With Automating a DWH Development
- Data Lakes, Data Warehouses and Data Lakehouses
- Moscone South | Upper Mezzanine | 160
- 35 min
The traditional enterprise data warehouses typically struggle when it comes to handling large volumes of data and traffic, particularly when it comes to semi-structured and unstructured data. In contrast, data lakes manage to overcome such issues and have nowadays become the central hub for storing data. In this session we further outline how we can enable BI Kimball data modelling development in a Lakehouse environment.
In this session, we will present why & how we built a Spark-based framework to modernize and automate data warehouse development while having the data lake as central storage, assuring high data quality and scalability. The framework has proven to work & was already implemented in over 15 enterprise data warehouses across various companies in Europe.
In depth we will present in our session how one can tackle in Spark & with Delta Lake data warehouse principles like surrogate, foreign and business keys, SCD type 1 and 2 dimensions etc., while being able to address the shortcomings of the traditional data warehouses. Additionally, we will share our experiences on how such a unified (proper) data modelling framework can help bridge BI with modern day use cases, such as machine learning and real time analytics.
This session is a perfect fit for the Data & AI conference given the underlying technology which are Spark & Delta Lake. In our session we welcome the opportunity to share our original challenges, the steps taken, the build framework as well as the technical hurdles we faced along the way.