Two ways to use this template
- 1. Click "Copy prompt" below
- 2. Paste into Cursor, Claude Code, Codex, or any coding agent
- 3. Your agent builds the app — it asks questions along the way so the result is exactly what you want
Follow the steps below to set things up manually, at your own pace.
Query AI Gateway Endpoints
Query AI Gateway endpoints for production-ready access to foundation models with built-in governance.
Prerequisites
Verify these Databricks workspace features are enabled before starting. If any check fails, ask your workspace admin to enable the feature.
- Databricks CLI authenticated. Run
databricks auth profilesand confirm at least one profile showsValid: YES. If none do, authenticate withdatabricks auth login --host <workspace-url> --profile <PROFILE>. - AI Gateway (currently in Beta). AI Gateway is built into all Foundation Model API endpoints, but it is still a Beta feature — behavior and APIs can change. Confirm availability by listing endpoints and checking the config:
databricks serving-endpoints list --profile <PROFILE>should return at least onedatabricks-*foundation-model endpoint, anddatabricks serving-endpoints get <endpoint-name> --profile <PROFILE> -o json | grep -q '"ai_gateway"' && echo okshould printok. Endpoint availability varies by workspace and region.
Query AI Gateway Endpoints
Access Databricks foundation models through AI Gateway endpoints with built-in governance, monitoring, and production-readiness features.
1. Understand AI Gateway endpoints
AI Gateway is a governance layer on top of model serving endpoints that provides permissions, rate limiting, payload logging, and AI guardrails. Currently in beta, AI Gateway is becoming the default way to access foundation models in Databricks.
Note: AI Gateway is built into all Foundation Model API endpoints. If you need to access non-AI Gateway endpoints, use the Databricks SDK's servingEndpoints.query() method directly.
2. Check if AI Gateway is available
All Foundation Model API endpoints have AI Gateway built-in. To verify, check if a known FM endpoint has the ai_gateway configuration:
databricks serving-endpoints get <your-endpoint> --profile <PROFILE> --output json | grep -q '"ai_gateway"' && echo "✓ AI Gateway available" || echo "✗ No AI Gateway"
3. Choose your model
List available AI Gateway endpoints in your workspace:
databricks serving-endpoints list --profile <PROFILE>
Common AI Gateway endpoint names:
databricks-meta-llama-3-3-70b-instructdatabricks-gemini-3-1-flash-litedatabricks-dbrx-instruct
Note: When using this template with a coding agent, specify which endpoint to use based on what's available in your workspace. Endpoint names may vary.
Important: Endpoint availability varies by workspace. Always run
databricks serving-endpoints listto check what's available before configuring your app.
4. Configure environment variables
For local development (.env):
DATABRICKS_ENDPOINT=<your-endpoint>
For deployment (app.yaml):
env:
- name: DATABRICKS_ENDPOINT
value: "<your-endpoint>"
5. Query AI Gateway endpoints
import { getWorkspaceClient } from "@databricks/appkit";
// {} tells the SDK to use default auth chain (env vars / profile).
// Do NOT omit. getWorkspaceClient() with no argument will throw.
const workspaceClient = getWorkspaceClient({});
const endpoint = process.env.DATABRICKS_ENDPOINT || "<your-endpoint>";
async function queryModel(messages: any[]) {
const result = await workspaceClient.servingEndpoints.query({
name: endpoint,
messages: messages,
max_tokens: 1000,
});
return result;
}
For streaming responses: For OpenAI-compatible models, use the Vercel AI SDK's createOpenAI provider with AI Gateway:
import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
const databricks = createOpenAI({
baseURL: `https://${process.env.DATABRICKS_WORKSPACE_ID}.ai-gateway.cloud.databricks.com/mlflow/v1`,
apiKey: token,
});
const result = streamText({
model: databricks.chat(endpoint), // e.g., "databricks-gpt-5-4-mini"
messages,
maxOutputTokens: 1000,
});
// AI SDK v6: pipe the text stream to the Express response
result.pipeTextStreamToResponse(res);
Auth for streaming: The streaming example above requires a bearer token for
createOpenAI(). See the Streaming AI Chat template for the full auth helper pattern using@databricks/sdk-experimental.
Note: This pattern works with OpenAI-compatible models (
databricks-gpt-5-4-mini,databricks-gpt-oss-120b). Native Databricks models use the MLflow unified API.Workspace ID: AppKit auto-discovers this at runtime. For explicit setup, run
databricks api get /api/2.1/unity-catalog/current-metastore-assignment --profile <PROFILE>and use theworkspace_idfield.
See the Streaming AI Chat template for a complete implementation.
6. Test the endpoint
Query an AI Gateway endpoint:
databricks serving-endpoints query <your-endpoint> \
--json '{"messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}' \
--profile <PROFILE>
References
- AI Gateway Overview
- AI Gateway and Serving Endpoints
- Vercel AI SDK - For streaming implementations