Zalando strives to be a fully data-driven company that utilizes AI to make decisions fast and accurately. For this reason we have built a Data Lake that contains all data of the company. To provide easy access to that data and enable the company to make use of it, we have established an internal platform that offers Databricks as a service for all departments and teams. Making Databricks Delta tables available to all clients of the Data Lake enabled them to leverage Structured Streaming and to build continuous applications on top of it. Big part of this journey was solving challenges in governance, security and access management.
In this talk we want to share our experience in productionizing and operating Databricks at scale and in making data-driven continuous applications feasible out of the box.
Data Engineer with 7+ years of data/software engineering experience. Has a degree in applied mathematics and MSc degree in computer science mostly on the topics of data processing and analysis. Currently working at Zalando (Berlin) on the company's Data Lake project, building an internal data platform on top of S3, Spark, Presto and serverless cloud technologies, enabling machine learning and AI for all teams and departments of the company, and solving GDPR in one place.
Max Schultze is a lead data engineer working on building a data lake at Zalando, Europe’s biggest online platform for fashion. His focus lies on building data pipelines at petabytes scale and productionizing Spark and Presto on Delta Lake inside the company. He graduated from the Humboldt University of Berlin, actively taking part in the university’s initial development of Apache Flink.