Marc Planagumà is Head of Data Engineering at SCRM-LIDL Digital hub. Senior Data Engineer expert on building and managing highly scalable platforms and teams for advanced analytics. He is currently focus in design and build a data platform for LIDL Loyalty Program and Schwarz Group but previously he did the same in Zurich Insurances Corp as Platform Manager, Eurecat Technology Center as Chief Data Engineer, Berlin Big Data Center as guest Researcher, Barcelona Digital and Telefonica I+D as Researcher.
He holds MSC in Telecommunications with more than 15 years of experience in data architecture, software engineering, devops management and agile methodologies. He is director and lecturer of Data Engineering Master from Universitat de Barcelona IL3. He also have a long long experience as DistSys Researcher on R&D centers like Telefonica I+D, BDigital, Eurecat and Berlin Big Data Center. My specialties: Distributed systems, NoSQL, Big Data Analytics and Cloud Computing.
May 28, 2021 10:30 AM PT
Despite the increased availability of ready-to-use generic tools, more and more enterprises are deciding to build in-house data platforms. This practice, common for some time in research labs and digital native companies, is now making its waves across large enterprises that traditionally used proprietary solutions and outsourced most of their IT. The availability of large volumes of data, coupled with more and more complex analytical use cases driven by innovations in data science have yielded these traditional and on premise architectures to become obsolete in favor of cloud architectures powered by open source technologies.
The idea of building an in-house platform at a larger enterprise comes with many challenges of its own: Build an Architecture that combines the best elements of data lakes and data warehouses to accommodate all kinds from BI to ML use cases. The need to interoperate with all the company’s data and technology, including legacy systems. Cultural transformation, including a commitment to adopt agile processes and data driven approaches.
This presentation describes a success story on building a Lakehouse in an enterprise such as LIDL, a successful chain of grocery stores operating in 32 countries worldwide. We will dive into the cloud-based architecture for batch and streaming workloads based on many different source systems of the enterprise and how we applied security on architecture and data. We will detail the creation of a curated Data Lake comprising several layers from a raw ingesting layer up to a layer that presents cleansed and enriched data to the business units as a kind of Data Marketplace.
A lot of focus and effort went into building a semantic Data Lake as a sustainable and easy to use basis for the Lakehouse as opposed to just dumping source data into it. The first use case being applied to the Lakehouse is the Lidl Plus Loyalty Program. It is already deployed to production in 26 countries with more than 30 millions of customers’ data being analyzed on a daily basis. In parallel to productionizing the Lakehouse, a cultural and organizational change process was undertaken to get all involved units to buy into the new data driven approach.