Create a Databricks Model Serving endpoint

Create and validate a Databricks Model Serving endpoint for AI chat inference in Databricks Apps.

Prerequisites

Verify these Databricks workspace features are enabled before starting. If any check fails, ask your workspace admin to enable the feature.

Databricks CLI authenticated. Run databricks auth profiles and confirm at least one profile shows Valid: YES. If none do, authenticate with databricks auth login --host <workspace-url> --profile <PROFILE>.
Model Serving enabled. Run databricks serving-endpoints list --profile <PROFILE> and confirm the command succeeds (an empty list is fine — you are about to create an endpoint). A permission or not enabled error means Model Serving is not available to this identity.
Permission to create serving endpoints. The Databricks CLI call in Step 3 requires the CAN_MANAGE serving-endpoint permission on the workspace. If databricks serving-endpoints create returns PERMISSION_DENIED, ask your admin to grant it.
A foundation model or registered MLflow model to serve. List foundation-model entities available to your workspace with databricks serving-endpoints get-open-api --profile <PROFILE> -o json. If you plan to serve a registered Unity Catalog model instead, confirm it exists in the Databricks UI under Models before running databricks serving-endpoints create.

Create a Databricks Model Serving Endpoint

Create and validate a Databricks Model Serving endpoint for AI chat inference.

1. Choose an endpoint name

Pick a descriptive endpoint name for your app or feature.

Examples:

support-assistant
analytics-copilot

2. List available foundation models

bash
databricks serving-endpoints get-open-api \
  --profile <PROFILE> \
  -o json

If your workspace uses a curated endpoint catalog, list available endpoints first:

bash
databricks serving-endpoints list --profile <PROFILE> -o json

3. Create a serving endpoint

Create an endpoint with a served model using the workspace-supported model name.

bash
databricks serving-endpoints create <endpoint-name> \
  --config '{
    "served_entities": [
      {
        "name": "<entity-name>",
        "entity_name": "<foundation-model-or-registered-model>",
        "entity_version": "<version-if-required>",
        "workload_size": "Small",
        "scale_to_zero_enabled": true
      }
    ]
  }' \
  --profile <PROFILE>

4. Wait until the endpoint is ready

bash
databricks serving-endpoints get <endpoint-name> --profile <PROFILE> -o json

Check that the endpoint is in a READY state before connecting your app.

5. Test the endpoint directly

Use the OpenAI-compatible chat completions API exposed by Databricks:

bash
curl -sS \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  "https://<workspace>.cloud.databricks.com/serving-endpoints/<endpoint-name>/invocations" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Say hello in one short sentence." }
    ],
    "max_tokens": 64
  }'

6. Add endpoint name to app config

Set the endpoint name in app.yaml:

yaml
env:
  - name: DATABRICKS_SERVING_ENDPOINT
    value: "<endpoint-name>"

For local development, mirror this in .env (for example DATABRICKS_SERVING_ENDPOINT=databricks-gpt-5-4-mini).