Two ways to use this template
- 1. Click "Copy prompt" below
- 2. Paste into Cursor, Claude Code, Codex, or any coding agent
- 3. Your agent builds the app — it asks questions along the way so the result is exactly what you want
Follow the steps below to set things up manually, at your own pace.
Create a Databricks Model Serving endpoint
Create and validate a Databricks Model Serving endpoint for AI chat inference in Databricks Apps.
Prerequisites
Verify these Databricks workspace features are enabled before starting. If any check fails, ask your workspace admin to enable the feature.
- Databricks CLI authenticated. Run
databricks auth profilesand confirm at least one profile showsValid: YES. If none do, authenticate withdatabricks auth login --host <workspace-url> --profile <PROFILE>. - Model Serving enabled. Run
databricks serving-endpoints list --profile <PROFILE>and confirm the command succeeds (an empty list is fine — you are about to create an endpoint). A permission ornot enablederror means Model Serving is not available to this identity. - Permission to create serving endpoints. The Databricks CLI call in Step 3 requires the
CAN_MANAGEserving-endpoint permission on the workspace. Ifdatabricks serving-endpoints createreturnsPERMISSION_DENIED, ask your admin to grant it. - A foundation model or registered MLflow model to serve. List foundation-model entities available to your workspace with
databricks serving-endpoints get-open-api --profile <PROFILE> -o json. If you plan to serve a registered Unity Catalog model instead, confirm it exists in the Databricks UI under Models before runningdatabricks serving-endpoints create.
Create a Databricks Model Serving Endpoint
Create and validate a Databricks Model Serving endpoint for AI chat inference.
1. Choose an endpoint name
Pick a descriptive endpoint name for your app or feature.
Examples:
support-assistantanalytics-copilot
2. List available foundation models
databricks serving-endpoints get-open-api \
--profile <PROFILE> \
-o json
If your workspace uses a curated endpoint catalog, list available endpoints first:
databricks serving-endpoints list --profile <PROFILE> -o json
3. Create a serving endpoint
Create an endpoint with a served model using the workspace-supported model name.
databricks serving-endpoints create <endpoint-name> \
--config '{
"served_entities": [
{
"name": "<entity-name>",
"entity_name": "<foundation-model-or-registered-model>",
"entity_version": "<version-if-required>",
"workload_size": "Small",
"scale_to_zero_enabled": true
}
]
}' \
--profile <PROFILE>
4. Wait until the endpoint is ready
databricks serving-endpoints get <endpoint-name> --profile <PROFILE> -o json
Check that the endpoint is in a READY state before connecting your app.
5. Test the endpoint directly
Use the OpenAI-compatible chat completions API exposed by Databricks:
curl -sS \
-H "Authorization: Bearer <TOKEN>" \
-H "Content-Type: application/json" \
"https://<workspace>.cloud.databricks.com/serving-endpoints/<endpoint-name>/invocations" \
-d '{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Say hello in one short sentence." }
],
"max_tokens": 64
}'
6. Add endpoint name to app config
Set the endpoint name in app.yaml:
env:
- name: DATABRICKS_SERVING_ENDPOINT
value: "<endpoint-name>"
For local development, mirror this in .env (for example DATABRICKS_SERVING_ENDPOINT=databricks-gpt-5-4-mini).