CUSTOMER
STORY

How NAB’s journey to 100% Declarative Pipelines is helping data flow like electricity

1,800

Spark Declarative Pipelines running on Databricks Lakeflow

86% to 99.6%

Improvement in job success rate

80%

Less transformation complexity

National Australia Bank (NAB), Australia’s largest business bank, is modernizing its enterprise data platform by standardizing on Spark Declarative Pipelines across more than 1,600 data pipelines used by 300–400 engineers.

After years of managing complex, hand-written Spark and legacy data warehouse ETL, NAB is moving to a fully declarative, streaming-first architecture - reducing latency, improving data quality, and dramatically simplifying how data pipelines are built and operated at scale.

Overhauling a data strategy with aggressive goals

Leading this transformation was the key priority for Dheeraj Puli, Head of Data Reliability Engineering. When he joined three years ago, Puli immediately recognized NAB’s advantage.

“I was delighted to see that everything was already in the cloud, while other banks were still trying to find ways to get there,” he recalls.

Despite being standardized on Databricks, NAB faced a familiar challenge at scale: hundreds of engineers building Spark pipelines in different ways, with limited consistency, guardrails, or reuse. The bank needed a single way to run, standardize, and productionize Spark workloads - from ingestion to transformation - without slowing teams down.

Lakeflow became that unifying layer, providing a consistent foundation for how data engineers access data, build pipelines, and operationalize Spark across the platform.

Puli’s mandate was to build and scale ADA, NAB’s central data platform. His team of roughly 20 engineers supports hundreds of data engineers while enforcing a simple design principle: latency should never exceed processing time.

“If processing takes 20 minutes, the data should be available in 20 minutes,” says Puli.

From custom Spark to Declarative Pipelines

To help hundreds of engineers converge on a single, consistent way of building pipelines on Lakeflow, NAB adopted Spark Declarative Pipelines - a set of declarative APIs in Apache Spark - and used Lakeflow to run and operationalize those pipelines with built-in orchestration, scaling, and reliability on Databricks.

Previously, NAB relied heavily on hand-written Spark and legacy SQL. Bronze ingestion was standardized, but Silver and Gold transformations used a custom Spark framework, creating significant complexity.

“In our Spark world, we had DMLs with 5,000 to 8,000 lines of code,” Puli says. “Some pipelines had 64 union-all to handle complex business logic.

“I call Spark Declarative Pipelines hyper-standardization. There’s only one way to do things - and that consistency is what we were missing.”

With declarative pipelines, engineers define what the data should look like - target tables, merge keys and business logic - while Lakeflow handles orchestration, state management, and incremental processing automatically. Using this approach, NAB reduced some pipelines from 64 union-all down to just three, dramatically lowering operational risk and cognitive load.

Building an end-to-end streaming architecture

NAB began by standardizing Bronze ingestion on Spark Declarative Pipelines, onboarding 120 data sources in the first year. Today, 100% of pipelines run declaratively at the Bronze layer, with roughly 50% migrated at Silver.

The long-term goal is a fully declarative, end-to-end streaming architecture-from Bronze to Gold.

Declarative pipelines also enabled incremental processing and built-in change data capture. Instead of reprocessing entire datasets, pipelines process only new or changed records-reducing cost and improving performance and reliability. Lakeflow’s built-in CDC capabilities, including AutoCDC, further eliminated the need to hand-code SCD logic that previously spanned thousands of lines of Spark and SQL.

Tangible operational and business impact

While NAB is still early in its journey, the results are already clear:

Latency has dropped dramatically, now supporting under 15 minutes for end to end use cases
Pipeline success rates have increased from 86% to 99.6%
Data quality has improved by 38%
Onboarding is faster, replacing months of custom training with a single framework
Significant cost reductions are expected as adoption expands

Unlocking self-service and what’s next

As adoption grows, NAB is preparing to extend Spark Declarative Pipelines fully into the Gold layer, unlocking new levels of self-service for business users.

Looking ahead, NAB aims to become the first bank to run 100% of its pipelines on Spark Declarative Pipelines, setting a new standard for how large financial institutions build, operate, and scale data platforms.

“This is a rapidly evolving product,” Puli concludes. “We’re going to keep being curious - there’s a lot more we haven’t explored yet.”