Automate job execution on historical data to ensure more accurate downstream data and solve data quality issues
Backfill runs are now generally available in Lakeflow Jobs.
Managing complex data ecosystems with numerous sources and constant updates is challenging for data engineering teams. They often face unpredictable but common issues like cloud vendor outages, broken connections to data sources, late-arriving data, or even data quality issues at the source. Other times, they have to deal with sudden business rule changes that impact the entire data orchestration.
The result? Downstream data is stale, inaccurate, or incomplete. While backfilling - rerunning jobs with historical data - is a common need and solution to this, traditional manual and ad hoc backfills are tedious, error-prone, and don't scale, hindering efficient resolution of common data quality issues.
Imagine you are a data engineer at a retail company responsible for creating a weekly order summary report for the Business Intelligence team. Your report is critical for tracking revenue and customer behavior for dynamic sales generation. This Job is scheduled to run every Monday morning before the new work week begins, and uses the iso_datetime job parameter to timestamp your data.
One morning, you discover that a broken connection to one of your data sources from the past 3 weeks caused critical pricing data to be omitted, making your entire summary table inaccurate. Simultaneously, the marketing team just introduced a new formula for calculating customer lifetime value (LTV), and they need all historical order data to be reprocessed to reflect this new business logic. This adds a new layer of complexity to your data orchestration that needs to be addressed promptly, given the marketing analytics and strategy needs.
Lakeflow Jobs can resolve both issues using the new backfill runs, which easily processes historical data directly in Lakeflow Jobs, all in a no-code UI. Simply by clicking “Run backfill” in the Jobs UI, you can configure the date and time range for the historical data, choose the granularity at which you want to run the job, and select the parameters for that backfill that you’d wish to override, without writing a single line of code.

In the image above, 7 backfill runs will be created at one-day intervals, the first on October 9th, 2025 at 10:00 AM, and the last on October 16th, 2025 at 10:00 AM. The parameter "backfill.iso_date" will be passed into each backfill run (e.g. 2025-10-09 10:00:00.000 for the first run). Once “Run” is clicked, 7 concurrent runs will be automatically triggered to backfill the data in your job.
In short, backfill runs in Lakeflow Jobs helps you: