Infrastructure cost reduction
Productivity benefit for data engineering delivery
End-to-end ingestion time reduction
Gousto’s mission is to change the way people eat through the delivery of boxes of fresh ingredients and easy-to-follow recipes. But the company is much more than a food delivery service. Gousto aims to leverage data and AI to create a more convenient and personalized experience for their customers. In the words of Gousto’s Chief Technology Officer, Shaun Pearce, “Gousto is a data company that loves food.” However, even before the massive pandemic-driven increase in demand, Gousto was at a crossroads. “We were at capacity. Demand was huge, but we couldn’t scale the fulfillment supply chain,” said Pearce.
At Gousto, recipe boxes take a journey through a fulfillment center on a conveyor belt, past a series of stations where boxes stop for an agent to pack ingredients. An efficient journey means more boxes per hour. Detailed box ingredients and stock inventory data were being collected, but couldn’t be analyzed quickly enough to optimize their route or anticipate ingredient availability. The ETL batch process was taking over two hours — if it worked — meaning Gousto was always looking back at historical data and relying on ad hoc observations to make strategic decisions for the business. This latency was impacting a key performance measure — on time in full (OTIF). “We knew we had to rethink our ETL to deliver production-line data in near real-time,” said Eoin O’Flanagan, who heads up data engineering at Gousto.
Added to this, Gousto had disparate systems and data sources managing warehouse stock levels and ingredient replenishment at each pick station, so it was impossible to gain a unified view into the data without significant manual effort. “The systems were not designed for querying and analytics, and we needed visibility across the whole line to make efficiency improvements,” said O’Flanagan.
The time wasted on troubleshooting failed data ingest batch jobs, and maintaining infrastructure was time taken away from developing the more sophisticated data analytics. “We needed to reduce infrastructure tasks so we could focus on collaborating to write code, not dealing with pipeline and EMR issues, so that new ideas could become a reality much faster,” said O’Flanagan. Gousto needed to move from daily batch updates using costly EMR to near real-time streaming data.
Gousto realized they needed a cost-effective cloud-based platform on AWS that would deliver on their current challenges and ambitions for the future. With the Databricks Lakehouse Platform in place, Gousto now enjoys a coherent view of supply chain events. Merging the incoming stream of pick station inventory-level data and warehouse-stock-level data into Delta Tables in Delta Lake, using Merge and CDC, has given Gousto visibility of pick-line status refreshed in near real-time, and agents have the right ingredients on their stations to pack their assigned boxes. “Getting a near real-time view on inventory data is unlocking much more capability in our stock management, giving us a view on stock movement to the line and improved planning. We couldn’t do it without Delta Lake,” said O’Flanagan.
The Databricks platform also means Gousto now has the right tooling to spin up new clusters in minutes to work on new ideas without delay. “Data preparation time has reduced from 3 hours to 5 minutes. We can easily attach notebooks to prototype new features and debug issues, collaborate on the same data, and the feedback loop for trying out new code has gone down from 20 minutes to 20 seconds,” said O’Flanagan.
Running a sequence of batch jobs used to be a complex pipeline with multiple points of failure for Gousto’s data engineers, slowing their ability to feed clean data for downstream analytics to optimize their supply chain. Now, “we’ve switched off the batch pipeline, and our supply chain ETL stack is 100% streaming, with no failures since deployment,” said O’Flanagan.
Gousto now enjoys near real-time reporting on fulfillment center performance and inventory levels. End-to-end ingestion time has been reduced by 99.6% — a job that took two hours is down to 15 seconds, and the ingestion infrastructure costs have been reduced. “Now we can drill down to see the reason for bottlenecks on the fulfillment line and optimize box routing and inventory slotting. It’s a proactive rather than reactive approach,” said O’Flanagan. “Which is good news for our OTIF measure, even in times of record demand,” said Pearce.
The improved availability and accuracy of data mean Gousto can easily expand their customer fulfillment capacity — from two factories to four in the next year — and, as a bonus, this has also created significant infrastructure cost savings, the big one around EMR. “There’s more data running in real-time but with no increase in costs versus EMR. Like for like, it works out as a 60% cost improvement,” said O’Flanagan.
In fact, because Gousto tracks ticket metrics in every development sprint, they know there’s been a 40% productivity benefit for data engineering delivery. “We have reduced the time it takes to develop new ideas from days to minutes and increased the availability and accuracy of our data,” said Pearce.
Databricks is helping Gousto deliver on today’s known challenges and has Gousto ready for the unknown ones of the future — positioning the team to harness AI and ML and realize their objectives around menu personalization and next-day delivery. “Databricks has helped us take giant steps toward our ambitions. It’s a platform with a huge potential,” said Pearce.