Skip to main content

AI Gateway

AI Gateway is a Databricks governance layer for LLM endpoints and MCP servers. It tracks usage, enforces rate limits, logs payloads, filters unsafe content and PII, and attributes cost. See the AI Gateway overview for a full product introduction. From your AppKit app, you call a governed endpoint with the Model Serving plugin. This page covers the AppKit wiring, the governance features, and the CLI for inspecting and provisioning endpoints.

Prerequisites

Call a governed endpoint from AppKit

The Model Serving plugin handles the HTTP plumbing, auth, and streaming. Endpoint names come from environment variables at runtime, so the same code runs locally and in production.

Register the plugin

server/server.ts
import { createApp, server, serving } from "@databricks/appkit";

const AppKit = await createApp({
plugins: [
server(),
serving({
endpoints: {
chat: { env: "DATABRICKS_SERVING_ENDPOINT_NAME" },
},
}),
],
});

chat is an alias you pick. The plugin resolves it at request time by reading DATABRICKS_SERVING_ENDPOINT_NAME. Bind the env var in app.yaml:

app.yaml
env:
- name: DATABRICKS_SERVING_ENDPOINT_NAME
valueFrom: serving-endpoint

When you deploy, Databricks Apps injects the endpoint name into the container. For local dev, set the env var in .env.

Stream from a React component

client/src/ChatPanel.tsx
import { useState } from "react";
import { useServingStream } from "@databricks/appkit-ui/react";

export function ChatPanel() {
const [prompt, setPrompt] = useState("");
const { stream, chunks, streaming, error, reset } = useServingStream(
{ messages: [{ role: "user", content: prompt }], max_tokens: 500 },
{ alias: "chat" },
);

return (
<>
<input value={prompt} onChange={(e) => setPrompt(e.target.value)} />
<button onClick={() => stream()} disabled={streaming || !prompt}>
Send
</button>
<button onClick={reset}>Clear</button>
{chunks.map((chunk, i) => (
<pre key={i}>{JSON.stringify(chunk)}</pre>
))}
{error && <p>{error}</p>}
</>
);
}

The first argument is the request body. The second holds options, including the alias. The hook manages the SSE connection, aborts on unmount, and accumulates parsed chunks into state. For a non-streaming call, use useServingInvoke with the same shape.

For chat models, extract text from each chunk (typically chunk.choices?.[0]?.delta?.content) and concatenate for display. During development, rendering raw chunks as JSON confirms the shape before you build your display logic.

Call it from a route handler

For agent orchestration, pre/post-processing, or logging on the backend, call the plugin directly. The plugin's built-in HTTP routes run as the authenticated user by default. In a custom route handler like this one, call .asUser(req) explicitly to get the same per-user behavior.

server/server.ts
AppKit.server.extend((app) => {
app.post("/api/summarize", async (req, res) => {
const { text } = req.body;
const result = await AppKit.serving("chat")
.asUser(req)
.invoke({
messages: [
{ role: "system", content: "Summarize the text in two sentences." },
{ role: "user", content: text },
],
});
res.json(result);
});
});

Named versus default mode

The examples above use named mode with an explicit alias. Omit the config to register a default alias backed by DATABRICKS_SERVING_ENDPOINT_NAME. Named mode scales to multiple endpoints (chat, classifier, embeddings) in the same app.

Two AI Gateway surfaces

Two surfaces, one plugin

You might see AI Gateway in two places in your workspace:

  • Classic: features toggled on an existing Model Serving endpoint. Usage logs to system.serving.endpoint_usage. The Model Serving plugin calls these endpoints directly.

  • Beta standalone: a separate product with its own endpoints under the LLMs tab of the AI Gateway UI. Usage logs to system.ai_gateway.usage. The Model Serving plugin doesn't call these directly. For Databricks-hosted Beta endpoints, click View legacy endpoint in the workspace UI to get the underlying Model Serving endpoint name, then point the plugin at that.

  • AI Gateway landing

  • AI Gateway for LLM endpoints

  • Configure AI Gateway on model serving endpoints

Governance features

AI Gateway features vary by endpoint type. Configure them in the workspace UI or through the REST API (PUT /api/2.0/serving-endpoints/{name}/ai-gateway).

FeatureWhat it does
Usage trackingRecords request and token counts to system.serving.endpoint_usage
Payload loggingLogs request and response payloads to Unity Catalog inference tables
Rate limitsQPM and TPM limits per user, group, or service principal
AI GuardrailsSafety filters (Llama Guard) and PII detection (Presidio)
FallbacksRoute to backup endpoints on failure
Traffic splittingSplit traffic across multiple served entities

See Configure AI Gateway on serving endpoints for the full configuration guide. For the newer standalone experience, see AI Gateway for LLM endpoints.

AI Gateway also governs MCP server access. AppKit apps don't configure this directly. It applies when an agent endpoint you call (for example ABMAS or a custom Python agent) routes to an MCP server internally. See custom agent endpoints.

List available endpoints

Use the CLI to see which endpoints your workspace exposes and which ones already have AI Gateway features configured.

databricks serving-endpoints list -o json
Options
OptionRequiredDescription
--limitnoMaximum number of results to return
--debugnoEnable debug logging
-o jsonnoOutput as JSON (default: text)
--targetnoBundle target to use (if applicable)
--profilenoDatabricks CLI profile name

Foundation Model API endpoints (prefixed databricks-) are available in most workspaces with AI Gateway built in. For example, databricks-claude-sonnet-4-6. Availability varies by workspace.

Example output (truncated)
[
{
"ai_gateway": {
"usage_tracking_config": { "enabled": true }
},
"config": {
"served_entities": [
{
"foundation_model": {
"display_name": "Claude Sonnet 4.6",
"name": "system.ai.databricks-claude-sonnet-4-6"
},
"name": "databricks-claude-sonnet-4-6"
}
]
},
"name": "databricks-claude-sonnet-4-6",
"state": { "config_update": "NOT_UPDATING", "ready": "READY" },
"task": "llm/v1/chat"
}
]

Inspect an endpoint

databricks serving-endpoints get databricks-claude-sonnet-4-6 -o json
Options
OptionRequiredDescription
NAMEyesServing endpoint name
--debugnoEnable debug logging
-o jsonnoOutput as JSON (default: text)
--targetnoBundle target to use (if applicable)
--profilenoDatabricks CLI profile name

Check for ai_gateway in the response to confirm AI Gateway is configured on the endpoint.

Query from the terminal

Useful for smoke-testing an endpoint before wiring it into your app.

databricks serving-endpoints query databricks-claude-sonnet-4-6 \
--json '{"messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'
Options
OptionRequiredDescription
NAMEyesServing endpoint name
--jsonnoInline JSON or @path/to/file.json with request body
--max-tokensnoMax tokens for completions and chat endpoints
--temperaturenoSampling temperature
--nnoNumber of candidates to generate
--streamnoEnable streaming responses
--client-request-idnoRequest identifier for inference and usage tables
--debugnoEnable debug logging
-o jsonnoOutput as JSON (default: text)
--targetnoBundle target to use (if applicable)
--profilenoDatabricks CLI profile name

Provision an endpoint

databricks serving-endpoints create my-model-endpoint \
--json '{
"config": {
"served_entities": [
{
"name": "my-entity",
"entity_name": "my-registered-model",
"workload_size": "Small",
"scale_to_zero_enabled": true
}
]
}
}'
Options
OptionRequiredDescription
NAMEyesEndpoint name (alphanumeric, dashes, underscores)
--jsonyesInline JSON or @path/to/file.json with endpoint config
--route-optimizednoEnable route optimization
--budget-policy-idnoBudget policy to apply
--descriptionnoEndpoint description
--no-waitnoReturn immediately instead of waiting for NOT_UPDATING state
--timeoutnoMax time to wait for completion (default: 20m)
--debugnoEnable debug logging
-o jsonnoOutput as JSON (default: text)
--targetnoBundle target to use (if applicable)
--profilenoDatabricks CLI profile name

Wait for the endpoint to reach READY state before querying it. For a step-by-step walkthrough, see the Create a Model Serving Endpoint template.

Coding agent integrations

AI Gateway can also govern AI coding tools. Route requests from Cursor, Codex CLI, and Gemini CLI through a Databricks AI Gateway endpoint to get one invoice, one usage dashboard, and one place to manage permissions and rate limits across your organization.

To set up an integration, open AI Gateway in your workspace sidebar, go to the LLMs tab, and open the Coding agents section. Follow the tool-specific instructions (base URL, API key, model provider).

See Integrate with coding agents for the full walkthrough and the current list of supported tools.

Where to next

Try the AI Chat App to wire a governed endpoint into your app, or explore the other agent capabilities: Genie spaces or Custom agent endpoints.