HomepageData + AI Summit 2022 Logo
Watch on demand

Powering Up the Business with a Lakehouse

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Lakes, Data Warehouses and Data Lakehouses

Industry

  • Retail and Consumer Goods

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 159

Duration

  • 35 min
Download session slides

Overview

Within Wehkamp we required a uniform way to provide reliable and on time data to the business, while making this access compliant with GDPR. Unlocking all the data sources that we have scattered across the company and democratize the data access was of the utmost importance, allowing us to empower the business with more, better and faster data.

Focusing on open source technologies, we've built a data platform almost from the ground up that focuses on 3 levels of data curation - bronze, silver and gold - which follows the LakeHouse Architecture.
The ingestion into bronze is where the PII fields are pseudonymized, making the use of the data within the delta lake compliant and, since there is no visible user data, it means everyone can use the entire delta lake for exploration and new use cases. Naturally, specific teams are allowed to see some user data that is necessary for their use cases.
Besides the standard architecture, we've developed a library that allows us to ingest new data sources by adding a JSON config file with the characteristics. This combined with the ACID transactions that delta provides and the efficient Structured Stream provided through Auto Loader has allowed a small team to maintain 100+ streams with insignificant downtime.

Some other components of this platform are the following:
- Alerting to Slack
- Data quality checks
- CI/CD
- Stream processing with the delta engine

The feedback so far has been encouraging, as more and more teams across the company are starting to use the new platform and taking advantage of all its perks. It is still a long time until we get to turn off some of the components of the old data platform, but it has come a long way.

Session Speakers

Headshot of Ricardo Simon Moreira Wagenmaker

Ricardo Simon Moreira Wagenmaker

Senior Data Engineer

Wehkamp

See the best of Data+AI Summit

Watch on demand