George Claireaux

Data Engineer,

George graduated from University of Mancherster, UK with a BSc in Business Management. Working for Mars Petcare he has lead the construction of Kyte: An advanced Databricks & Spark built pipeline tool processing into Delta Lake on the Mars Petcare Data Platform.

Past sessions

At Mars Petcare (in a division known as Kinship Data & Analytics) we are building out the Petcare Data Platform - a cloud based Data Lake solution. Leveraging Microsoft Azure, we were faced with important decisions around tools and design. We chose Delta Lake as a storage layer to build out our platform and bring insight to the science community across Mars Petcare. Migrating away from Azure Data Factory completely, we leveraged Spark and Databricks to build 'Kyte', a bespoke pipeline tool which has massively accelerated our ability to ingest, cleanse and process new data sources from across our large and complicated organisation. Building on this we have started to use Delta Lake for our ETL configurations and have built a bespoke UI for monitoring and scheduling our Spark pipelines. Find out more about why we chose a Spark-heavy ETL design and a Delta Lake driven platform, the advantages (and difficulties) of migrating away from Azure Data Factory, and why we are committing to Spark and Delta Lake as the core of our Platform to support our mission: Making a Better World for Pets! Key Takeaways:

  • Leveraging Delta Lake as Engineers for exposing data to Data Scientists
  • Advantages of a Databricks & Spark ETL Solution over Azure Data Factory
  • Using Delta Lake for ETL Config