Why effective model risk management now depends on platform architecture, not procedural compliance.
by Pavithra Rao, Jennifer Miller, Chaitanya Varanasi and Kim Hatton
On April 17, 2026, the Federal Reserve, FDIC and OCC rescinded SR 11-7, OCC 2011-12, FIL-22-2017 and related BSA/AML issuances, replacing them with a more explicitly risk-based, principles-driven framework for model risk management.
This is not a narrow technical update. It reflects a broader view that models are central to how banks make decisions, and that model risk must be governed with the same seriousness as credit or market risk.
For practitioners inside a bank, that translates into a concrete set of expectations: inventory is tiered by materiality, controls are applied proportionately, and our lifecycle is defensible end-to-end.
On a traditional stack, that answer is two to three quarters of sprint work: inventory migration, validation template rewrites, new monitoring pipelines, documentation refreshes, vendor-model onboarding, and parallel workstreams for GenAI and agentic systems that supervisors now treat as in-scope by principle. Every workstream is a project, a change ticket, and an audit exposure.
The real question is not "how do we build compliance to this guidance?" It is "what platform decision makes the next guidance change — and the one after that — a configuration exercise instead of a program?"
The 2026 revision is less a rewrite of controls than a re-segmentation of how we apply them. Five shifts matter for practitioners:
The shared thread: evidence must be produced as a byproduct of how models are built, not reconstructed after the fact. That is a platform problem, not a policy problem.
We take the regulatory intent as a given. Rather than debating the guidance, we focus on the operating model it implies:
The remainder of this article outlines a reference architecture on Databricks — designed to meet those needs on a single governed substrate, because in practice, these requirements cannot be reliably composed from a collection of point solutions without recreating the fragmentation MRM is meant to eliminate.
We map the revised MRM expectations onto concrete Databricks capabilities so banks can see how to operationalize these principles on the Lakehouse.
The architecture below is what makes "one lineage graph" more than a slogan. Every lifecycle stage resolves to a governed object in Unity Catalog. The same primitives serve classical ML and GenAI, so the MRM team operates one framework, not two.
Layer | What It Contains | Why the MRM Team Cares |
Governance Layer | Unity Catalog Attribute-Based Access Control (ABAC) End-to-end lineage graph Audit logs | One source of truth for inventory, ownership, tier, and access. Lineage makes "how was this prediction produced?" answerable in one query. |
Data & Feature Layer | Delta Lake (bronze / silver / gold) Lakeflow Declarative Pipelines Databricks Feature Store Data quality expectations | Data quality is evidenced, not asserted. Feature definitions are versioned, so train/serve consistency is provable. |
Model Layer | MLflow Tracking (experiments) UC Model Registry (versions, aliases, tags) Mosaic AI Model Serving Agent Bricks / Mosaic Agent Framework | Classical models and GenAI agents register the same way, promote the same way, and carry the same tier tags. |
Assurance Layer | Lakehouse Monitoring (drift, performance) AI Gateway (guardrails, PII, rate limits) Databricks Apps (validator workflow) Genie spaces (examiner Q&A) | Monitoring, validator review, and examiner interaction all read from the same governed inventory — no parallel tooling. |
The governance layer is not something bolted on at the end — it is what every other layer writes into. That is why a tier change becomes a metadata update rather than a migration, and why an examiner gets one answer from one system.
Each lifecycle stage produces a specific kind of evidence the new guidance expects. The Databricks architecture turns that evidence into a structured byproduct of normal work — not a separate compliance pass at the end.
Lifecycle Stage | MRM Expectation | Databricks Component | Evidence Produced |
Data sourcing | Data quality, provenance, fit for purpose. | Unity Catalog, Delta Lake, Lakeflow Declarative Pipelines with expectations. | Column-level lineage, DQ metrics, reproducible point-in-time snapshots. |
Feature engineering | Versioned, consistent feature definitions across train and serve. | Feature Store on UC, online/offline stores. | Feature version history, consumer models list, skew detection. |
Model development | Reproducibility, documented assumptions, technique justification. | MLflow Tracking with Git, automated experiment logging. | Run history, hyperparameters, metrics, code commit, environment. |
Independent validation | Champion/challenger, sensitivity analysis, bias & fairness testing. | MLflow Evaluate, separate validator workspace, Databricks Apps for workflow. | Versioned challenger artifacts, fairness metrics, validator sign-off bound to model version. |
Deployment | Controlled promotion, rollback capability, role-based approval. | UC Model Registry aliases, Mosaic AI Model Serving, ABAC promotion policies. | Promotion history, approver identity, atomic rollback path. |
Monitoring | Continuous performance and drift monitoring, proportionate to tier. | Lakehouse Monitoring on inference tables, custom fairness metrics. | Drift dashboards, threshold breaches, alert history in one system of record. |
Documentation | Current development, validation, and change documentation. | Auto-generated model cards, Genie spaces for natural-language queries. | Living documentation bound to the production model version — not a PDF from last quarter. |
Retirement | Controlled decommissioning with preserved audit trail. | Registry lifecycle states, Delta Lake retention of training artifacts. | Retirement record, final monitoring state, preserved lineage. |
Any individual capability can be assembled from point tools. The architectural point is that on Databricks they are one lineage graph. The examiner questioned "what data trained this model, who validated it, how has it drifted, and which production decisions used it?" is a single traversal — not a cross-team evidence-gathering exercise.
Every model in the registry carries structured tags: materiality tier, business line, guidance version, assigned validator, last validation date. These tags are not decoration — they are read by access policies, monitoring thresholds, and the portfolio-level MRM dashboard.
When supervisors refine materiality definitions — or when internal policy does — the tier changes. On this architecture, a tier change is a tag update, applied in minutes, visible across every downstream control. There is no re-platforming, no pipeline rewrite, no documentation redrafting.
Proportionality is the guidance's central principle, and historically the hardest to evidence. On Databricks, it becomes an attribute-based access rule tied to the tier tag.
In practice, this looks like simple ABAC policies on Unity Catalog objects. For example:
• Tier-1 material models: promotion to production requires approval from the independent MRM validator group. Dual control is enforced, not encouraged.
• Tier-2 standard models: team lead plus validator can promote. Lighter oversight, still auditable.
• Tier-3 low-materiality models: model owner can promote within their own workspace; monitoring thresholds are looser; documentation requirements are reduced.
The bank does not need a policy document explaining how proportionality works. The access control logs explain it, for every model, for every promotion, for as long as the audit retention window runs.
In practice, this translates directly into ABAC policy logic on Unity Catalog objects:
IF model.tier = 'Tier1'
THEN require_approver_role IN ('MRM_Validator', 'Model_Risk_Committee')
AND require_dual_control = TRUE
The same tier tag can also drive stricter monitoring thresholds and shorter validation cycles, without custom code per model. The bank does not need a separate policy document to explain proportionality; access control logs and configuration demonstrate it, model by model, promotion by promotion.
A clean catalog hierarchy is the single most underrated governance decision. A workable pattern separates inventory and evidence from the models themselves:
Inventory catalog — holds model metadata, validator sign-offs, inventory overlays, validator queue tables.
Key tables in this catalog follow a simple pattern:
models.inventory — one row per model version, with fields such as tier, owner, guidance_version, intended_use, and dependent_processes.
models.validation_log — one row per validation event, keyed by model_version_id, with validator_id, validation_scope, issues_found, and residual_risk_rating.
Classical ML catalog — per-business-line schemas for credit, AML, fraud, capital models.
GenAI catalog — LLM endpoints and agents, registered as first-class models with tool registries.
Monitoring catalog — drift, performance, and fairness metric tables produced by Lakehouse Monitoring.
Evidence catalog — challenger runs, validation artifacts, model cards, retired model archives.
This separation lets MRM leadership grant read-only access to evidence and monitoring without exposing the underlying training data — a common sticking point in exam prep.
Banks are running both at once: a PD model governed by decades of MRM practice, and an LLM-based AML triage assistant that no one has figured out how to govern yet. The traditional instinct is to build a second framework for the second type of model. That doubles the cost, doubles the audit surface, and guarantees divergence.
On Databricks, classical and GenAI share the same registry, the same lifecycle stages, and the same evidence pattern — with layer-specific capabilities where the model type demands them.
Lifecycle Concern | Classical ML (credit, AML, fraud) | GenAI & Agentic Systems |
Registration | UC Model Registry entry with version, owner, tier tag. | Same registry — LLM endpoints and Agent Bricks apps registered as first-class models with tool registries. |
Evaluation | MLflow Evaluate: AUC, KS, PSI, fairness across protected attributes. | MLflow LLM evaluation: groundedness, relevance, toxicity, LLM-as-judge on domain-specific criteria. |
Effective challenge | Champion/challenger models, benchmark datasets, backtesting. | Prompt and model variants, eval sets with expected outputs, agent trace comparison. |
Monitoring | Lakehouse Monitoring: performance, drift, fairness on inference tables. | MLflow tracing plus AI Gateway telemetry: latency, cost, hallucination rate, guardrail trigger rate. |
Access & guardrails | UC ABAC on features, models, and serving endpoints. | AI Gateway: PII redaction, rate limits, safety filters, approved-model allowlist. |
Documentation | Auto-generated model card with data and feature lineage. | Same model card structure plus prompt versions, agent graph, tool registry. |
When supervisors extend MRM principles to GenAI — which they are already doing — we do not stand up a second framework. We apply the first one.
• Work in a governed notebook environment where tracking, lineage, and feature registration are automatic — not compliance checkboxes added at the end.
• Iterate on baselines and agentic patterns quickly with AutoML and Agent Bricks; every iteration is logged and reproducible.
• Ship faster because promotion, monitoring, and documentation are built into the same workflow — not handed off to a separate team.
• Read-only access to the exact training data, feature versions, and code that produced the model — no data copies, no staleness.
• Challenger and benchmark runs versioned alongside the champion; sensitivity analyses reproducible on demand.
• Sign-off is itself a first-class artifact in the registry, tied to the model version — not a memo attached to an email thread.
• Databricks Apps provide a structured review workflow: queue, comments, sign-off, escalation — all auditable.
• One dashboard across the inventory: tier distribution, validation status, monitoring health, outstanding issues — not five GRC exports stitched together.
• Tier and ownership enforced by ABAC policies. Proportionality is not a policy document; it is an access rule with an audit log.
• Third-party and GenAI models registered the same way as internal models. Coverage gaps are visible before an examiner finds them.
Consider a representative question from a supervisory review: "Show us the validation evidence, production performance, and drift history for the credit PD model over the past twelve months, sliced by business line."
On a fragmented stack, this is a two-week evidence-gathering exercise across the registry, the data lake, the BI tool, and the GRC system — each with its own identity model and data freshness. On the Databricks reference architecture:
• The validation evidence lives in the inventory catalog, tied to the model version.
• Production performance and drift history live in the monitoring catalog, continuously written by Lakehouse Monitoring.
• Business line is a tag on the model and a slicing dimension on the monitor.
• Genie space over the MRM catalog answers the question in natural language, with row-level access filters ensuring the examiner sees only what they are entitled to.
Turnaround moves from weeks to hours. More importantly, the evidence is the same evidence the bank's own MRM team uses — so there is no discrepancy between what the bank reports internally and what it shows the examiner.
The 2026 guidance requires banks to "shift left," moving risk controls to the very start of the model lifecycle. By using Spark Declarative Pipelines (SDP), governance becomes an automated part of the data flow rather than a manual hurdle. Instead of auditing models after they are built, SDP uses built-in quality expectations to block non-compliant data or unstable features before they reach the Model Registry. This ensures every asset in the Medallion Architecture is compliant by design, with a complete audit trail generated as a natural byproduct of development. By automating the "effective challenge" through these pipelines, MRM teams can spend less time on manual data gathering and more time on high-level oversight.

Every regulatory response draws from a finite pool of MRM analysts, model developers, and validators. How that capacity gets spent is the difference between a platform that helps and one that drags. Three structural benefits follow from a unified substrate:
The structural argument for Databricks is not that it handles this guidance change faster — though it does — but that it converts the next one, and the one after that, from a program into a configuration.
A notable constraint on a bank's AI roadmap is not just compute or data — it is the human capacity of model risk teams and the Center of Excellence (CoE). As the current guidance expands the definition of "model-like" systems to include GenAI and agentic workflows, the volume of validation requests will outpace the headcount of qualified practitioners.
Rather than every LLM prototype requiring a bespoke manual review, Databricks allows the CoE to codify the bank's standard into a first-pass automation layer.
The practical problem is familiar: a business unit wants to ship an LLM assistant in four weeks, while the CoE has a six-month backlog.
Databricks solves this by allowing the CoE to delegate execution while retaining control. The CoE provides the automation harness — the monitoring, model cards, and metrics that make oversight repeatable. The business moves at GenAI speed. The 2026 guidance converts from a bottleneck into a guardrail.
The April 2026 guidance is not the last supervisory shift we will see this cycle. Agentic AI principles, third-party model oversight, and climate risk modeling are all in motion. The question is whether our platform turns each of those into a three-quarter project or a four-week prototype. That choice is made once.
Subscribe to our blog and get the latest posts delivered to your inbox.