April 25, 2026

Model risk management in 2026: A banker's guide to the revised interagency guidance

Why effective model risk management now depends on platform architecture, not procedural compliance.

by Pavithra Rao, Jennifer Miller, Chaitanya Varanasi and Kim Hatton

What changed — On April 17, 2026, the Federal Reserve, FDIC and OCC replaced SR 11-7, OCC 2011-12, FIL-22-2017 and related BSA/AML issuances with a more risk-based, principles-driven framework for model risk management.
Why it matters — Regulators are signaling that models are now central to how banks make decisions, and that model risk must be governed like credit or market risk — with clear tiering, proportionality, and effective challenge.
What's in this post — A practitioner's view of a reference architecture on Databricks that turns those expectations into a single governed lifecycle for both classical ML and GenAI, so evidence of good governance is generated as a byproduct of normal work.
Who it's for — Heads of MRM, validation leads, risk executives, and data science / AI leaders in banks and other regulated financial institutions.

What Changed in the April 2026 MRM Guidance

On April 17, 2026, the Federal Reserve, FDIC and OCC rescinded SR 11-7, OCC 2011-12, FIL-22-2017 and related BSA/AML issuances, replacing them with a more explicitly risk-based, principles-driven framework for model risk management.

This is not a narrow technical update. It reflects a broader view that models are central to how banks make decisions, and that model risk must be governed with the same seriousness as credit or market risk.

For practitioners inside a bank, that translates into a concrete set of expectations: inventory is tiered by materiality, controls are applied proportionately, and our lifecycle is defensible end-to-end.

On a traditional stack, that answer is two to three quarters of sprint work: inventory migration, validation template rewrites, new monitoring pipelines, documentation refreshes, vendor-model onboarding, and parallel workstreams for GenAI and agentic systems that supervisors now treat as in-scope by principle. Every workstream is a project, a change ticket, and an audit exposure.

The real question is not "how do we build compliance to this guidance?" It is "what platform decision makes the next guidance change — and the one after that — a configuration exercise instead of a program?"

What the New MRM Framework Actually Demands

The 2026 revision is less a rewrite of controls than a re-segmentation of how we apply them. Five shifts matter for practitioners:

Risk-based tailoring — Every model must sit in a tier reflecting inherent risk, exposure, and purpose. Tier-1 material models carry full lifecycle oversight; lower tiers earn proportionate, lighter controls — but only if we can evidence the tiering itself.
Lifecycle thinking — Development, validation, deployment, monitoring, and retirement are one governed chain. Supervisors expect lineage across every link, not snapshots at hand-off points.
Effective challenge — Challenger models, outcomes analysis, benchmarking, and sensitivity testing must be versioned and reproducible — not a one-time memo.
Continuous monitoring — Performance drift, data drift, and stability must be tracked continuously, with thresholds mapped to materiality.
Principles extend to AI — GenAI and agentic systems are formally out of scope but inherit the principles. Supervisors and internal audit are already applying MRM expectations by analogy to LLM-based underwriting assistants, AML triage agents, and customer-facing copilots.

The shared thread: evidence must be produced as a byproduct of how models are built, not reconstructed after the fact. That is a platform problem, not a policy problem.

Our Approach

We take the regulatory intent as a given. Rather than debating the guidance, we focus on the operating model it implies:

How can banks make risk-tiering, proportionality, and effective challenge systemic, not manual?
How can evidence of good governance be generated automatically from day-to-day model work?
What kind of platform decision turns the next guidance update from a multi-quarter program into a configuration change?

The remainder of this article outlines a reference architecture on Databricks — designed to meet those needs on a single governed substrate, because in practice, these requirements cannot be reliably composed from a collection of point solutions without recreating the fragmentation MRM is meant to eliminate.

We map the revised MRM expectations onto concrete Databricks capabilities so banks can see how to operationalize these principles on the Lakehouse.

The Databricks Reference Architecture for MRM

The architecture below is what makes "one lineage graph" more than a slogan. Every lifecycle stage resolves to a governed object in Unity Catalog. The same primitives serve classical ML and GenAI, so the MRM team operates one framework, not two.

Four Layers, One Substrate

Layer	What It Contains	Why the MRM Team Cares
Governance Layer	Unity Catalog Attribute-Based Access Control (ABAC) End-to-end lineage graph Audit logs	One source of truth for inventory, ownership, tier, and access. Lineage makes "how was this prediction produced?" answerable in one query.
Data & Feature Layer	Delta Lake (bronze / silver / gold) Lakeflow Declarative Pipelines Databricks Feature Store Data quality expectations	Data quality is evidenced, not asserted. Feature definitions are versioned, so train/serve consistency is provable.
Model Layer	MLflow Tracking (experiments) UC Model Registry (versions, aliases, tags) Databricks Model Serving Agent Bricks / Mosaic Agent Framework	Classical models and GenAI agents register the same way, promote the same way, and carry the same tier tags.
Assurance Layer	Lakehouse Monitoring (drift, performance) AI Gateway (guardrails, PII, rate limits) Databricks Apps (validator workflow) Genie spaces (examiner Q&A)	Monitoring, validator review, and examiner interaction all read from the same governed inventory — no parallel tooling.

Architectural anchor

The governance layer is not something bolted on at the end — it is what every other layer writes into. That is why a tier change becomes a metadata update rather than a migration, and why an examiner gets one answer from one system.

Mapping the ML Lifecycle to MRM Evidence

Each lifecycle stage produces a specific kind of evidence the new guidance expects. The Databricks architecture turns that evidence into a structured byproduct of normal work — not a separate compliance pass at the end.

Lifecycle Stage	MRM Expectation	Databricks Component	Evidence Produced
Data sourcing	Data quality, provenance, fit for purpose.	Unity Catalog, Delta Lake, Lakeflow Declarative Pipelines with expectations.	Column-level lineage, DQ metrics, reproducible point-in-time snapshots.
Feature engineering	Versioned, consistent feature definitions across train and serve.	Feature Store on UC, online/offline stores.	Feature version history, consumer models list, skew detection.
Model development	Reproducibility, documented assumptions, technique justification.	MLflow Tracking with Git, automated experiment logging.	Run history, hyperparameters, metrics, code commit, environment.
Independent validation	Champion/challenger, sensitivity analysis, bias & fairness testing.	MLflow Evaluate, separate validator workspace, Databricks Apps for workflow.	Versioned challenger artifacts, fairness metrics, validator sign-off bound to model version.
Deployment	Controlled promotion, rollback capability, role-based approval.	UC Model Registry aliases, Databricks Model Serving, ABAC promotion policies.	Promotion history, approver identity, atomic rollback path.
Monitoring	Continuous performance and drift monitoring, proportionate to tier.	Lakehouse Monitoring on inference tables, custom fairness metrics.	Drift dashboards, threshold breaches, alert history in one system of record.
Documentation	Current development, validation, and change documentation.	Auto-generated model cards, Genie spaces for natural-language queries.	Living documentation bound to the production model version — not a PDF from last quarter.
Retirement	Controlled decommissioning with preserved audit trail.	Registry lifecycle states, Delta Lake retention of training artifacts.	Retirement record, final monitoring state, preserved lineage.

Any individual capability can be assembled from point tools. The architectural point is that on Databricks they are one lineage graph. The examiner questioned "what data trained this model, who validated it, how has it drifted, and which production decisions used it?" is a single traversal — not a cross-team evidence-gathering exercise.

Key Governance Patterns

5.1 Materiality Tiering as Metadata, Not Migration

Every model in the registry carries structured tags: materiality tier, business line, guidance version, assigned validator, last validation date. These tags are not decoration — they are read by access policies, monitoring thresholds, and the portfolio-level MRM dashboard.

When supervisors refine materiality definitions — or when internal policy does — the tier changes. On this architecture, a tier change is a tag update, applied in minutes, visible across every downstream control. There is no re-platforming, no pipeline rewrite, no documentation redrafting.

5.2 Proportionality Enforced Through ABAC

Proportionality is the guidance's central principle, and historically the hardest to evidence. On Databricks, it becomes an attribute-based access rule tied to the tier tag.

In practice, this looks like simple ABAC policies on Unity Catalog objects. For example:

• Tier-1 material models: promotion to production requires approval from the independent MRM validator group. Dual control is enforced, not encouraged.

• Tier-2 standard models: team lead plus validator can promote. Lighter oversight, still auditable.

• Tier-3 low-materiality models: model owner can promote within their own workspace; monitoring thresholds are looser; documentation requirements are reduced.

The bank does not need a policy document explaining how proportionality works. The access control logs explain it, for every model, for every promotion, for as long as the audit retention window runs.

In practice, this translates directly into ABAC policy logic on Unity Catalog objects:

IF model.tier = 'Tier1'

THEN require_approver_role IN ('MRM_Validator', 'Model_Risk_Committee')

AND require_dual_control = TRUE

The same tier tag can also drive stricter monitoring thresholds and shorter validation cycles, without custom code per model. The bank does not need a separate policy document to explain proportionality; access control logs and configuration demonstrate it, model by model, promotion by promotion.

5.3 The MRM Catalog as an Information Architecture

A clean catalog hierarchy is the single most underrated governance decision. A workable pattern separates inventory and evidence from the models themselves:

Inventory catalog — holds model metadata, validator sign-offs, inventory overlays, validator queue tables.

Key tables in this catalog follow a simple pattern:

models.inventory — one row per model version, with fields such as tier, owner, guidance_version, intended_use, and dependent_processes.
models.validation_log — one row per validation event, keyed by model_version_id, with validator_id, validation_scope, issues_found, and residual_risk_rating.
Classical ML catalog — per-business-line schemas for credit, AML, fraud, capital models.
GenAI catalog — LLM endpoints and agents, registered as first-class models with tool registries.
Monitoring catalog — drift, performance, and fairness metric tables produced by Lakehouse Monitoring.
Evidence catalog — challenger runs, validation artifacts, model cards, retired model archives.

This separation lets MRM leadership grant read-only access to evidence and monitoring without exposing the underlying training data — a common sticking point in exam prep.

Classical ML and GenAI Under One Framework

Banks are running both at once: a PD model governed by decades of MRM practice, and an LLM-based AML triage assistant that no one has figured out how to govern yet. The traditional instinct is to build a second framework for the second type of model. That doubles the cost, doubles the audit surface, and guarantees divergence.

On Databricks, classical and GenAI share the same registry, the same lifecycle stages, and the same evidence pattern — with layer-specific capabilities where the model type demands them.

Lifecycle Concern	Classical ML (credit, AML, fraud)	GenAI & Agentic Systems
Registration	UC Model Registry entry with version, owner, tier tag.	Same registry — LLM endpoints and Agent Bricks apps registered as first-class models with tool registries.
Evaluation	MLflow Evaluate: AUC, KS, PSI, fairness across protected attributes.	MLflow LLM evaluation: groundedness, relevance, toxicity, LLM-as-judge on domain-specific criteria.
Effective challenge	Champion/challenger models, benchmark datasets, backtesting.	Prompt and model variants, eval sets with expected outputs, agent trace comparison.
Monitoring	Lakehouse Monitoring: performance, drift, fairness on inference tables.	MLflow tracing plus AI Gateway telemetry: latency, cost, hallucination rate, guardrail trigger rate.
Access & guardrails	UC ABAC on features, models, and serving endpoints.	AI Gateway: PII redaction, rate limits, safety filters, approved-model allowlist.
Documentation	Auto-generated model card with data and feature lineage.	Same model card structure plus prompt versions, agent graph, tool registry.

When supervisors extend MRM principles to GenAI — which they are already doing — we do not stand up a second framework. We apply the first one.

Three Constituencies, One Platform

Data Scientists & Model Developers — velocity without corner-cutting

• Work in a governed notebook environment where tracking, lineage, and feature registration are automatic — not compliance checkboxes added at the end.

• Iterate on baselines and agentic patterns quickly with AutoML and Agent Bricks; every iteration is logged and reproducible.

• Ship faster because promotion, monitoring, and documentation are built into the same workflow — not handed off to a separate team.

MRM & Independent Validators — review with full context

• Read-only access to the exact training data, feature versions, and code that produced the model — no data copies, no staleness.

• Challenger and benchmark runs versioned alongside the champion; sensitivity analyses reproducible on demand.

• Sign-off is itself a first-class artifact in the registry, tied to the model version — not a memo attached to an email thread.

• Databricks Apps provide a structured review workflow: queue, comments, sign-off, escalation — all auditable.

Risk & Compliance Leadership — defensible oversight at portfolio scale

• One dashboard across the inventory: tier distribution, validation status, monitoring health, outstanding issues — not five GRC exports stitched together.

• Tier and ownership enforced by ABAC policies. Proportionality is not a policy document; it is an access rule with an audit log.

• Third-party and GenAI models registered the same way as internal models. Coverage gaps are visible before an examiner finds them.

The Examiner RFI, End to End

Consider a representative question from a supervisory review: "Show us the validation evidence, production performance, and drift history for the credit PD model over the past twelve months, sliced by business line."

On a fragmented stack, this is a two-week evidence-gathering exercise across the registry, the data lake, the BI tool, and the GRC system — each with its own identity model and data freshness. On the Databricks reference architecture:

• The validation evidence lives in the inventory catalog, tied to the model version.

• Production performance and drift history live in the monitoring catalog, continuously written by Lakehouse Monitoring.

• Business line is a tag on the model and a slicing dimension on the monitor.

• Genie space over the MRM catalog answers the question in natural language, with row-level access filters ensuring the examiner sees only what they are entitled to.

Turnaround moves from weeks to hours. More importantly, the evidence is the same evidence the bank's own MRM team uses — so there is no discrepancy between what the bank reports internally and what it shows the examiner.

Why Databricks — The Banker's Five Reasons

Policy changes become metadata changes — When materiality definitions, tier thresholds, or validator roles change, tags and access policies update in Unity Catalog. No re-platforming, no pipeline rewrites, no documentation refreshes.
One audit trail, not seven — Data, features, models, monitoring, and documentation sit on one substrate. Examiner questions are traced end-to-end in one system — not across a warehouse, a feature store, a registry, a BI tool, and a GRC platform.
Proportionality is enforceable — Tier-1 models get heavy controls, Tier-3 models get light — both enforced by the same ABAC policies. Proportionality becomes a defensible, auditable fact.
GenAI is not a parallel universe — Classical credit, AML, fraud, LLM endpoints, and agentic systems share one registry with the same evaluation, monitoring, and documentation harness. Coverage gaps are visible, not hidden in a second toolchain.
Capacity to rehearse before we commit — Fast prototypes mean a new control pattern can be tested on one Tier-1 model in weeks, refined with MRM, and then scaled. Regulatory response becomes iterative engineering — which is how the bank already runs everything else.

Shifting Risk Management Left

The 2026 guidance requires banks to "shift left," moving risk controls to the very start of the model lifecycle. By using Spark Declarative Pipelines (SDP), governance becomes an automated part of the data flow rather than a manual hurdle. Instead of auditing models after they are built, SDP uses built-in quality expectations to block non-compliant data or unstable features before they reach the Model Registry. This ensures every asset in the Medallion Architecture is compliant by design, with a complete audit trail generated as a natural byproduct of development. By automating the "effective challenge" through these pipelines, MRM teams can spend less time on manual data gathering and more time on high-level oversight.

The Capacity Argument

Every regulatory response draws from a finite pool of MRM analysts, model developers, and validators. How that capacity gets spent is the difference between a platform that helps and one that drags. Three structural benefits follow from a unified substrate:

Capacity stops being consumed by integration — On a fragmented stack, scarce MRM capacity is consumed by integration work — reconciling inventories across tools, rebuilding monitoring, re-documenting what the tools already know.
People focus on judgement, not plumbing — On a unified platform, capacity is freed for the work only humans can do: judgement on materiality, effective challenge on model design, conversation with examiners.
Governance becomes a byproduct, not a project — Lineage, documentation, monitoring, and access control are produced as a byproduct of how models are built and deployed — not as a separate compliance pass at the end.

The structural argument for Databricks is not that it handles this guidance change faster — though it does — but that it converts the next one, and the one after that, from a program into a configuration.

Organizational Value Driver

A notable constraint on a bank's AI roadmap is not just compute or data — it is the human capacity of model risk teams and the Center of Excellence (CoE). As the current guidance expands the definition of "model-like" systems to include GenAI and agentic workflows, the volume of validation requests will outpace the headcount of qualified practitioners.

"First Pass" Automation Layer

Rather than every LLM prototype requiring a bespoke manual review, Databricks allows the CoE to codify the bank's standard into a first-pass automation layer.

Self-Service Triage — Developers use standardized MLflow evaluation recipes (toxicity, groundedness, PII leakage) that run automatically. A model that cannot pass the first pass never reaches the CoE's desk.
Standardized Evidence — Because the platform enforces a common lineage and documentation schema, the CoE does not spend weeks cleaning evidence. They spend hours reviewing it.

The practical problem is familiar: a business unit wants to ship an LLM assistant in four weeks, while the CoE has a six-month backlog.

Databricks solves this by allowing the CoE to delegate execution while retaining control. The CoE provides the automation harness — the monitoring, model cards, and metrics that make oversight repeatable. The business moves at GenAI speed. The 2026 guidance converts from a bottleneck into a guardrail.

The Takeaway

The April 2026 guidance is not the last supervisory shift we will see this cycle. Agentic AI principles, third-party model oversight, and climate risk modeling are all in motion. The question is whether our platform turns each of those into a three-quarter project or a four-week prototype. That choice is made once.

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

View all blogs