Skip to main content
Product

Governing AI agents at scale with Unity Catalog

Four pillars for governing every model call, tool invocation, and agent interaction in your organization

by David Nasi and Stefania Leone

• AI governance is fundamentally a data governance challenge. By combining lineage, audit logs, inference traces, data quality monitoring, and classification in the lakehouse, organizations can securely govern AI systems while improving observability, compliance, and trust.
• Unity Catalog and Unity AI Gateway provide a unified governance layer for AI agents, models, MCP servers, and data — enforcing identity-aware access, runtime policies, guardrails, and full auditability across every agent interaction.
• Open standards and interoperable governance allow enterprises to govern any model, framework, or agent platform consistently. Unity Catalog and Unity AI Gateway centralize policies, observability, and cost intelligence across Databricks and third-party AI ecosystems

A year ago, your organization had a dozen AI agents. Today, there are thousands.

Every developer has a coding agent that writes, reviews, and ships code alongside them. Your analytics team built forecasting agents. Sales operations deployed lead scoring. The Support organization automated ticket routing. Marketing launched personalization. Finance built reconciliation workflows. Every team saw an opportunity and moved fast.

Now someone asks: "Which agents are accessing customer PII?"

The answer requires pulling logs from dozens of systems, manually correlating them, and hoping nothing was missed. Each agent logs, authenticates and accesses data differently. There's no single place to look.

Or maybe you took the opposite path. You locked everything down. No agents got deployed without extensive review. Security stayed tight. But now you're six months behind competitors who moved faster. Developers and users are frustrated. Some have left for companies where they can actually use AI tools.

Neither extreme works. Ungoverned agents create risk you can't measure. Locked-down environments create a different kind of risk: falling behind while talent walks out the door.

Traditional governance assumed humans make decisions and applications execute them predictably. Agents don't work that way. They're autonomous, they make different choices each time, and they chain together tools in ways you can't predict by reading code. You can't govern an agent by reviewing what it might do. You govern it by controlling what it can access and monitoring what it actually does.

Four pillars of agent governance

Unity Catalog has governed enterprise data since 2021 through a single permissions model, unified lineage, and a consistent audit trail across every asset. We're now extending that same governance infrastructure to cover every asset an AI system touches: LLMs, MCP servers, skills, and agents. The catalog that already knows who can access your customer data now also governs which agents can call which tools, and under what conditions.

Unity AI Gateway is the enforcement fabric for the agentic world. Every model call, every tool invocation, every agent interaction flows through the gateway. Each one is evaluated against policies defined in Unity Catalog before it executes, and logged after. Traditional governance tools were built for static applications. They have no visibility into any of this. Unity AI Gateway does.

Pillar 1: Delegated access

Agents must operate within clearly defined permission boundaries, both in terms of who they can act on behalf of and what they can access. Most platforms handle this the way they handle application permissions: service accounts with static credentials and broad access. You lose accountability, and you can't contain the blast radius.

Databricks takes a different approach: identity flows end to end, from the user who asks the question to the specific table row the agent retrieves. Agents inherit the invoking user's data permissions in real time via on-behalf-of token passing, not a shared service account. If you can't access a table in Unity Catalog, neither can the agent acting on your behalf. Every action is logged against both identities: the real user who triggered the request and the agent that acted on their behalf, capturing which tables were accessed, what operations ran, and when. When something goes wrong, you know exactly where the action came from and who authorized it.

We extend this model to MCP servers. Teams register external MCP servers (GitHub, Jira, Slack, etc.) in Unity Catalog and govern them like any other securable: permissions, credential management, and full audit logging in one place.

We recognized that the same principle applies at runtime, not just at access time. Knowing that an agent is allowed to call GitHub doesn't tell you whether it should delete a file or merge a pull request. So we built Service Policies, which are UC functions, managed in UC and attached to registered MCPs in Unity Catalog that control which tool calls succeed. Every tool call is evaluated before execution: based on the tool name, its arguments, or the identity of the caller, the policy returns allow, deny or asks for user consent. If the policy evaluation results in a ‘Deny’, the call is blocked. 

At the model layer, guardrails inspect what flows through inference in real time, scanning inputs for PII and jailbreak attempts, checking outputs for hallucinations and sensitive content before they reach the user. They run inline on every request and fail closed.

In practice, these three layers work together: permissions control who can call what. Service Policies control whether a specific tool call should proceed in the context of a given request. Guardrails control what content flows in and out.

Pillar 2: Data-centric AI governance

Here's the principle most AI governance tools miss: an agent's behavior is almost entirely determined by the data it has access to. What it can read, how fresh that data is, whether sensitive fields are masked, these aren't AI governance questions. They're data governance questions. Treat them separately, and you end up with two incomplete systems. Treat them together, and governance becomes self-reinforcing.

First, you need a complete audit trail, and regulation is making this non-negotiable. Emerging AI regulations require organizations to demonstrate what their AI systems did, what they were given, and what they produced. AI Gateway writes the full payload of every model call to inference tables: the exact prompt sent, the exact response returned, token counts and latency. Unity Catalog captures every access operation in audit logs, including which principal called what, from which agent and at what time. Both land in your lakehouse as tables, retainable on your terms. Most logging architectures force a trade-off between completeness and cost, requiring you to sample, filter, and set short retention windows. Because Unity AI Gateway captures observability data in your lakehouse, you don't have to.

Second, that audit data is only as useful as your ability to analyze it. Analysis requires a data platform, not a logging tool. Agent traces are tables in Unity Catalog, queryable with the same SQL you use for everything else. No new query language, no separate tooling. When an agent does something unexpected, the data to investigate is already there: which agents accessed a specific service last week, how much each team is spending on inference, and whether any agent touched credentials or PII. Because the audit data lives next to your business data, you can go further, joining agent behavior against business outcomes to understand not just what agents did, but whether it worked. Lakewatch, Databricks’ agentic SIEM built on the security lakehouse, takes this further still, turning the same audit trail into active security intelligence: AI-driven threat detection and response built on the lakehouse. Attackers are using agents. Defenders should too.

Third, you need to know that the data your agents relied on was trustworthy in the first place. A complete audit trail tells you what an agent accessed. It doesn't tell you whether that data was any good. Data quality monitoring continuously tracks freshness and completeness across your catalog. Join it against agent traces, and you move from "the agent gave a wrong answer" to "the agent queried a table that had been flagged as stale", connecting agent behavior to the quality of the data underneath it. Data classification adds a further layer: an agentic AI system continuously scans and tags sensitive columns, such as PII, HIPAA and GDPR-regulated data, and those tags feed directly into access control. Masked columns remain masked regardless of which agent or framework requests them. The data governance you already have becomes your AI governance automatically.

Pillar 3: Cost intelligence

Every model call has a price. Most enterprises have no idea who's running them up, what for, or whether any of it is working until the invoice arrives and finance is left explaining a number nobody saw coming.

The root cause isn't a broken process. It's missing infrastructure: no metering layer that sees all AI traffic in one place, no tagging system that attributes it to teams or use cases, no spend controls sitting alongside the access controls governing the same resources.

We built that into Unity Catalog and Unity AI Gateway. Usage-tracking logs every request to usage tables, including token counts, latency, requester identity and model destination across Databricks-hosted and external providers in a single table. It allows you to tag requests by team, project, or cost center. Because it lands as a table alongside your agent traces and business data, you can connect cost to outcomes. An agent that costs $200 and generates $50K in qualified pipeline is a bargain. An agent that costs $200 querying stale data in a loop is a waste. Without joining cost to outcome, you can't tell the difference.

Budgets in Unity AI Gateway add the policy layer. Admins set monthly spend thresholds per user or group and get alerted when consumption approaches or crosses them — the signal you need before spend becomes a problem, not after. Hard enforcement is the natural next step, and we'll have more to share on that soon.

Pillar 4: Open and interoperable

Every enterprise AI governance strategy eventually faces the same forcing function: a new team picks a different framework, a new provider releases a better model. If your governance is built into today's tooling choices, you're on a treadmill: every new framework is a new integration, every new model is a new policy.

We recognized this, and it's why we took a different approach than most governance tools. Governance can't live only in the agent layer. It also needs to live in the data and services that agents access, whether those services are Databricks-managed or not. An agent built on LangGraph and one built on CrewAI both query the same Unity Catalog, invoke the same governed MCP servers, and flow through the same AI Gateway. The framework is irrelevant. Governance travels with the resources, not the code that calls them.

Open standards make this concrete. MCPs give agents a universal tool connectivity protocol: register once in Unity Catalog, invoke from any framework with the same permissions and audit trail. Unity AI Gateway provides a single, governed endpoint for Databricks-hosted models, Azure OpenAI, AWS Bedrock, and Anthropic, with one policy, one audit trail, and one cost-attribution layer across providers. MLflow tracing auto-instruments LangChain, LlamaIndex, AutoGen, the OpenAI SDK, the Anthropic SDK, and more, with traces landed in Unity Catalog as tables without custom instrumentation per framework.

The end result is that governance becomes a property of your platform rather than something you rebuild for each new framework or model. Every agent you deploy, regardless of how it was built or which model powers it, accesses the same governed data, the same business logic, and the same permissions. You define the rules once, and every agent that comes after automatically picks them up.

Learn more 

The enterprises that get AI governance right won't just avoid incidents. They'll move faster than those that don't because their teams trust the infrastructure underneath them, and trust removes the friction that slows everyone else down.

If you're building toward this, start with our free course on governing AI agents, download the Databricks AI Security Framework (DASF), and visit our AI governance webpage for additional resources. 

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.