CUSTOMER
STORY

How Afresh Replaced Azure Postgres with Lakebase and Unified Data for 12,500+ Grocery Departments

2-day ML refresh fixes

Reduced to one-line code change

4 Product teams

Served by a single Lakebase instance

10M-row duplicate dataset

Eliminated from analytics pipeline

Afresh Technologies Inc. is on a mission to eliminate food waste and make fresh food accessible to all, powering ordering and inventory decisions across more than 12,500 grocery departments. As the company scaled, Postgres syncs and duplicated datasets demanded increasing attention from a focused team of data experts, leaving less capacity for product work. By standardizing on Lakebase within the Databricks Platform, Afresh unified operational and analytical data under Unity Catalog, improving reliability and freeing engineers to ship products.

Definitional Block

What is Lakebase? Lakebase is a fully managed, serverless PostgreSQL database natively integrated into the Databricks Platform. It provides a transactional data layer that sits alongside Delta tables. Teams can sync lakehouse data into read-only Lakebase tables for low-latency application serving, and separately sync Lakebase transactions back to lakehouse tables for downstream analytics. Both directions work without custom ETL pipelines or the need to manage permissions across separate stacks.

How duplicated datasets and manual syncs strained capacity for the data team

Every morning, Afresh ingests flat files from grocery partners and feeds the standardized, secure data into machine learning models that tell store associates exactly what to order across produce, meat, and bakery departments. Four product teams depend on that daily cadence, a setup that leaves little room to absorb issues or focus on new product work.

Afresh's architecture reflected the natural complexity of running a transactional application on Azure-hosted PostgreSQL while pipelines and machine learning ran on the Databricks Platform, following a migration off Snowflake. Connecting the two meant staging Postgres extracts, converting them to Parquet and uploading them into Databricks just to compare what the product did with what the upstream data said. For analytics, the team duplicated operational tables into a secondary dataset that grew to tens of millions of rows and required periodic fixes as new stores came online. A single monolithic refresh swapped 20 tables at once to maintain referential integrity, so an issue in one area could hold up unrelated updates, such as a simple seed file refresh.

As Afresh grew, the demands on this architecture grew with it. An Azure Postgres version upgrade unexpectedly removed a cache that the application relied on, reducing IOPS and affecting product performance for several days. Spiky login traffic and daily recomputation added further load. "A huge proportion of the data in our database changes every day, and we needed infrastructure that could keep up without a small team managing sync jobs around the clock," said Erin Leonhard Zhang, Staff Software Engineer at Afresh. "Data sync is part of our product, but it is not our special sauce."

Afresh deploys Lakebase to consolidate operational and analytical data

Afresh adopted Lakebase, the fully managed Postgres database built natively into the Databricks Platform, as the transactional layer behind nearly every product surface. The move placed operational data alongside pipelines, machine learning and BI inside one governed environment, with additional Lakebase instances planned as new products come online.

The team used the migration to rebuild reliability rather than swap a database endpoint. Monolithic refreshes were broken into modular, dbt-managed jobs so a single failure could no longer cascade across unrelated data. Coordinated 20-table swaps gave way to independent table refreshes, eliminating a long-running source of unreliability. Because dbt was already familiar to every engineer at Afresh, the new architecture consolidated the team's stack instead of expanding it.

With Lakebase tables governed through Unity Catalog, analysts now query operational data directly alongside Delta tables, giving teams visibility that previously required circular sync patterns to achieve. Afresh is also rebuilding reverse-refresh logic as dbt models that read from Lakebase through Unity Catalog, retiring the staging and format-conversion steps that once stood between Postgres and Delta Lake. "Having our operational data in Unity Catalog changed how our teams work together," said Erin. "Analysts, product engineers and data engineers all query from the same place instead of rebuilding the same datasets three different ways."

Lakebase use cases at Afresh

Afresh’s entire store-level product suite now runs on Lakebase. Current use cases include:

Daily ordering recommendations: ML models generate inventory and order recommendations for grocery departments and sync results through Lakebase into the ordering app used by store associates each morning.
Inventory estimation: Lakebase provides pre-populated on-hand counts, so associates can review and confirm inventory without manually counting every item.
Analytics and historical analysis: Analysts query operational data directly through Unity Catalog, retiring the 10-million-row secondary dataset and circular sync pipeline that required maintenance.
Modular data refreshes: Independent, dbt-managed table syncs replaced the monolithic 20-table coordinated swap, so a failure in one pipeline no longer blocks unrelated updates.

Faster fixes and fewer refresh incidents as Afresh scales to more stores

With Lakebase in place, the maintenance work that once consumed a high-impact data team has largely dissolved. The 10-million-row secondary dataset that analysts needed to reconstruct what a store associate saw at the moment of an order is gone. Product engineers no longer add columns to core tables solely to serve reporting needs. And new engineers onboard into a single governed environment with centralized permissions through Unity Catalog, rather than juggling VPN access, separate credentials and legacy deployment knowledge.

The benefits are clearest in time-sensitive situations. When the Machine Learning team needed to add a new feature to their forecaster, they were able to reverse-refresh the required data directly in dbt with a single line of code. The same change previously would have taken two days of working through stale packages and untested deployment paths. Modular refreshes have also reduced the impact of routine failures, so an unrelated promotions issue can no longer hold up a seed file update. "We used to have several different ways of doing a refresh. Now every team follows the same mechanism, which has made it much easier to standardize," said Mary Keenan, Staff Software Engineer.

Looking ahead, Afresh plans to adopt Lakebase autoscaling to better handle spiky login traffic and reduce costs, spin up additional Lakebase instances as its product architecture grows more modular, and move more orchestration from Airflow into Databricks to power use cases such as a 10-minute shipments pipeline that batch infrastructure cannot support today. “When your data platform handles transactions, pipelines and analytics in one place, you stop stitching tools together and start focusing on what you're actually good at. For us, that's helping grocers cut waste and keep fresh shelves stocked," said Mary.

FAQ: Afresh and Lakebase on Databricks

Share this post

Details

Industry: Technology and Software
Use Case: Application Development, Data Warehousing, Governance and Security, Data Intelligence Platform
Cloud: Azure
Product: Delta Lake, Lakebase, Unity Catalog

Want to learn more about Lakebase?

Learn more Watch a demo