CUSTOMER
STORY

Valora accelerates data-driven retail operations with Databricks Lakeflow

Spark Declarative Pipelines helps Valora handle data transformation challenges at scale

Valora Holding AG is a Swiss retail and food service company that operates convenience stores and food service locations in five European countries. Following its 2022 acquisition by Mexican multinational FEMSA, Valora embarked on a mission to become more data-driven in its decision-making processes. To accomplish this, Valora turned to Databricks Lakeflow on the Data Intelligence Platform to improve data quality and streamline its data engineering processes.

Building a foundation for data-driven retail operations

Following the FEMSA acquisition, Valora established a data platform team led by Daniel Habermehl, with Lukas Starke responsible for data engineering and platform ownership and Alexane Rose responsible for data and AI architecture. This centralization allows Valora to standardize IT services across different brands, creating a foundation for advanced analytics and AI initiatives.

The team quickly realized that their existing data infrastructure couldn't support their transformation goals. Their legacy Teradata system was reaching end-of-life, and while it supported basic reporting, it lacked the flexibility needed to pursue the more advanced analytics, data science, and AI initiatives they were planning. As they began expanding their data science resources, data quality issues also became apparent. "We realized a lot of the master data had issues and we had to do a lot of cleanup," explains Rose. "It was necessary to set the foundation because without it, we'd face 'garbage in, garbage out' problems. We needed to work with the business to understand what we're measuring and where we're introducing noise and garbage into our data."

Valora recognized that supporting sophisticated data science and AI use cases required a platform that could handle data quality challenges at scale, support diverse analytical workloads and provide flexibility as their data-driven capabilities matured.

Streamlining data engineering with Lakeflow and Spark Declarative Pipelines

Valora’s migration initiative began in 2023 and went live at the end of 2024, moving their legacy Teradata workloads onto the Databricks Data Intelligence Platform. During this period, the team also began enabling analytical use cases on Databricks ahead of the full cutover. As a result of the migration, Valora now runs its data pipelines on Lakeflow, bringing previously disparate technology stacks into a cohesive and unified data platform.

One of the core products from Rose’s original team was a dashboard that consumed daily batch workloads from jobs.

When the team wanted to start streaming master data via change data capture (CDC), they realized they could seamlessly tackle this requirement with AutoCDC in Lakeflow Spark Declarative Pipelines (SDP). "We were already using Auto Loader in the basic workflow, but using Spark Declarative Pipelines on top of Auto Loader was quite nice," explains Rose. "We gained a lot by doing CDC in SDP, because you don't write any code—it’s all abstracted in the background. The CDC minimizes the number of lines… it’s so easy to do. The fact that you don’t have to declare dependencies is really nice. Overall, it just gives a really clean experience."

That simplicity becomes especially valuable when applied to all of Valora’s master data, which scales quickly across the organization’s vast retail operations. SDP’s declarative approach also transformed how Valora builds and maintains data pipelines. SDP now powers real-time data flows for critical business use cases, from pricing optimization analysis to store layout revenue optimization for its various retail locations. The system handles complex scenarios including partial data loads and scheduled full reloads, providing visibility into data lineage and pipeline execution. “It’s magical how it traces the dependencies and puts it together,” adds Rose. “It’s very rewarding from a data engineering perspective, because you actually see it load… kudos to the people who created the new IDE.”

Lakeflow SDP also enables integrated data quality checks, ensuring reliable data for analytics without hard-coding validation rules. The platform's readability and user-friendly design make it easy for team members to understand and replicate workflows. That positions Valora to scale their data engineering capabilities beyond the central team to additional stakeholders across the organization. “SDP felt familiar very fast,” says Rose. “It’s extremely repeatable and it’s very easy to ramp up other team members because it’s so easy to read and replicate. It’s something we don’t have to worry about—it just works.”

Accelerating data-driven decision making across retail operations

The Databricks Data Intelligence platform and Lakeflow Spark Declarative Pipelines help ensure Valora’s data platform team delivers high quality data, which feeds into their goal to enhance data-driven decision-making capabilities. The streamlined data engineering processes support sophisticated analytics use cases that were previously impossible with their legacy Teradata infrastructure, enabling real-time insights that directly impact revenue generation.

The team also values the support they receive from their Databricks Solution Architect and Databricks’ continuous platform development, which consistently delivers new features. “We see that there's real development every month in Databricks,” says Starke. “And those new features are things that we can actually use, they're not just things that are nice to have.”

Looking ahead, Valora continues to evaluate the full potential of SDP and is considering migrating additional workflows to SDP as they scale their data capabilities beyond the central team. “Standardizing on Databricks is the best decision we've ever made,” adds Rose.

Share this post

Details

Industry: Retail and Consumer Goods
Use Case: Data Warehousing, Data Engineering
Product: Lakeflow Spark Declarative Pipelines

Ready to get started?

Try Databricks for free Talk to an expert