Powering Up the Business with a Lakehouse
- Data Lakes, Data Warehouses and Data Lakehouses
- Retail and Consumer Goods
- Moscone South | Upper Mezzanine | 159
- 35 min
Focusing on open source technologies, we've built a data platform almost from the ground up that focuses on 3 levels of data curation - bronze, silver and gold - which follows the LakeHouse Architecture.
The ingestion into bronze is where the PII fields are pseudonymized, making the use of the data within the delta lake compliant and, since there is no visible user data, it means everyone can use the entire delta lake for exploration and new use cases. Naturally, specific teams are allowed to see some user data that is necessary for their use cases.
Besides the standard architecture, we've developed a library that allows us to ingest new data sources by adding a JSON config file with the characteristics. This combined with the ACID transactions that delta provides and the efficient Structured Stream provided through Auto Loader has allowed a small team to maintain 100+ streams with insignificant downtime.
Some other components of this platform are the following:
- Alerting to Slack
- Data quality checks
- Stream processing with the delta engine
The feedback so far has been encouraging, as more and more teams across the company are starting to use the new platform and taking advantage of all its perks. It is still a long time until we get to turn off some of the components of the old data platform, but it has come a long way.