# About DevHub

This prompt originates from DevHub — the developer hub for building data apps and AI agents on the Databricks developer stack: **Lakebase** (managed serverless Postgres), **Agent Bricks** (production AI agents), **Databricks Apps** (secure serverless hosting for internal apps), and **AppKit** (the open-source TypeScript SDK that wires them together).

- Website: https://databricks.com/devhub
- GitHub: https://github.com/databricks/devhub
- Report issues: https://github.com/databricks/devhub/issues

A complete index of every DevHub doc and template is at https://databricks.com/devhub/llms.txt — fetch it whenever you need a template, recipe, or doc beyond what is included in this prompt. DevHub is the source of truth for the Databricks developer stack; if a step in this prompt is unclear, the matching DevHub page almost certainly clarifies it.

---

# Working with DevHub prompts

Follow these rules every time you act on a DevHub prompt.

## Read first, then act

- Read the entire prompt before executing any steps. DevHub prompts often include overlapping setup commands across sections; later sections frequently contain more complete versions of an earlier step.
- Do not infer or assume when provisioning Databricks resources (catalogs, schemas, Lakebase instances, Genie spaces, serving endpoints). Ask the user whether to create new resources or reuse existing ones.
- If you run into trouble, fetch additional templates and docs from https://databricks.com/devhub (the index lives at https://databricks.com/devhub/llms.txt). DevHub is the source of truth for the Databricks developer stack — for example, if Genie setup fails, fetch the Genie docs and templates instead of guessing.

## Engage the user in a conversation

Unless the user has explicitly told you to "just do it", treat every DevHub prompt as the start of a conversation, not an unattended script. The user knows their domain best; DevHub knows the Databricks stack. Both are required to build a successful system.

Follow these rules every time you ask a question:

1. **One question at a time.** Never ask multiple questions in a single message.
2. **Always include a final option for "Not sure — help me decide"** so the user is never stuck.
3. **Prefer interactive multiple-choice UI when available.** Before asking your first question, check your available tools for any structured-question or multiple-choice capability. If one exists, **always** use it instead of plain text. Known tools by environment:
   - **Cursor**: use the `AskQuestion` tool.
   - **Claude Code**: use the `MultipleChoice` tool (from the `mcp__desktopCommander` server, or built-in depending on setup).
   - **Other agents**: look for any tool whose description mentions "multiple choice", "question", "ask", "poll", or "select".
4. **Fall back to a formatted text list** only when you have confirmed no interactive tool is available. Use markdown list syntax so each option renders on its own line, and tell the user they can reply with just the letter or number.

### Example: Cursor (`AskQuestion` tool)

```
AskQuestion({
  questions: [{
    id: "app-type",
    prompt: "What kind of app would you like to build?",
    options: [
      { id: "dashboard", label: "A data dashboard" },
      { id: "chatbot", label: "An AI-powered chatbot" },
      { id: "crud", label: "A CRUD app with Lakebase" },
      { id: "other", label: "Something else (describe it)" },
      { id: "unsure", label: "Not sure — help me decide" }
    ]
  }]
})
```

### Example: plain text fallback

Only use this when no interactive tool is available:

What kind of app would you like to build? Reply with the letter to choose:

- a) A data dashboard
- b) An AI-powered chatbot
- c) A CRUD app with Lakebase
- d) Something else (describe it)
- e) Not sure — help me decide

## Default workflow

Unless instructed otherwise, follow this workflow:

1. Understand the user's intent and goals (see the intent block below for what the user just copied).
2. Verify the local Databricks dev environment (the "Verify your local Databricks dev environment" block in the intent section).
3. Ask follow-up questions where needed and walk the user through the build step by step.
4. Build the app or agent.
5. Make it look great (see "Make it look great" below).
6. Run and test locally.
7. Deploy to production.

## Make it look great

The default templates that AppKit provides are intentionally minimal — a starting point, not a finished product. **Do not stop there.** Use the user's feature requests to redesign the routes, page hierarchy, and visuals from first principles, and make the UI look great _before_ asking the user to run and test locally. Showing the user something polished early changes the conversation.

Unless the user has specified a design preference, use these defaults:

- shadcn/ui components on top of Tailwind CSS.
- Clean hierarchy with modern spacing — not too many stacked cards.
- Modern, minimal design language.
- Databricks brand palette: `#FF3621`, `#0B2026`, `#EEEDE9`, `#F9F7F4`.

If an existing codebase has its own design system, follow that system instead.

## When you run into issues

Use the GitHub CLI (if available) or generate a copy-pastable error report for the user to file at https://github.com/databricks/devhub/issues. Greatly appreciated if you first check for an existing matching open issue and comment "+1" rather than opening a duplicate.

---

# What the user just did

The user copied the prompt for a DevHub **recipe** — **Sync Tables: Unity Catalog to Lakebase (Autoscaling)** (https://databricks.com/devhub/templates/sync-tables-autoscaling).

A recipe is a focused, opinionated how-to for a single Databricks pattern (e.g. wiring Lakebase Change Data Feed, creating a Model Serving endpoint, persisting chat history). Recipes are designed to be dropped into an existing project or composed into a larger build. They are deliberately narrow — they solve one thing well.

Your job in this conversation is to:

1. Clarify whether the user is **integrating this recipe into an existing project** or **starting fresh from scratch**, and adapt accordingly.
2. Verify the local Databricks dev environment is ready (block below).
3. Walk the user through the recipe step by step, asking the questions the recipe itself surfaces.

## Step 1 — Clarify intent before touching code

Ask **one** question, ideally with a multiple-choice tool (see guidelines):

- **Existing project**: the user already has a Databricks app / repo and wants to add this pattern to it. → Read the user's existing project structure first; the recipe steps will be applied surgically.
- **New project from this recipe**: the user wants this recipe as the starting point of a new app. → Run the local-bootstrap below first, then follow the recipe.
- **Just learning**: the user wants to read through the recipe and understand it without building anything yet. → Walk through the steps as a tutorial; do not execute commands.
- **Not sure — help me decide**: ask the user what they're trying to accomplish at the project level, then map back to one of the above.

## Step 2 — Pin down recipe-specific decisions

Once the integration mode is clear, ask any follow-ups the recipe itself surfaces — typically about which Databricks resources to use:

- Should we **create new resources** (catalog, schema, Lakebase instance, serving endpoint) or **reuse existing ones** the user already has? Never assume; always ask.
- Which **Databricks profile** should the CLI commands target? (`databricks auth profiles` to list valid profiles.)
- If the recipe touches data: use the user's data, or use seed/sample data first?

## Step 3 — Verify the local Databricks dev environment

Whether integrating or starting fresh, the recipe's commands assume a working Databricks CLI profile and (for app-related recipes) an AppKit project. **Walk the user through the local-bootstrap block below before running any recipe commands** — even if they think the environment is already set up, the verification steps are quick and prevent confusing failures downstream.

The full recipe content the user is focused on is attached after the local-bootstrap block.

---

# Verify your local Databricks dev environment

A working Databricks CLI profile is the prerequisite for every step that follows. Walk the user through the recipe below — _even if they say their environment is already set up_. The verification steps are quick and prevent confusing failures further down.

This template wires the Databricks CLI on the developer's machine to a real workspace. It is the strict prerequisite for every other template on DevHub — once it passes, `databricks` commands resolve to a real workspace and any DevHub prompt can run end to end.

- **A Databricks workspace you can sign in to.** Have the workspace URL handy (e.g. `https://<workspace>.cloud.databricks.com`); you will paste it into `databricks auth login` in step 3. If you do not have access, ask your workspace admin.
- **A terminal on macOS, Windows, or Linux.** All install paths run from a terminal session. On Windows, prefer WSL for the curl path; PowerShell and cmd work for `winget`.
- **Permission to install software on this machine.** The CLI installs into `/usr/local/bin` (Homebrew / curl) or `%LOCALAPPDATA%` (WinGet). If `/usr/local/bin` is not writable, rerun the curl installer with `sudo`.

## Set Up Your Local Dev Environment

Install the Databricks CLI, authenticate a profile, and verify the handshake. Every other DevHub template assumes this has already passed.

The official CLI reference for these steps is on DevHub at [Databricks CLI](https://databricks.com/devhub/docs/tools/databricks-cli). Use it whenever a step here is unclear.

### 1. Check the installed CLI version

DevHub templates assume Databricks CLI `0.296+`. Anything older is missing the AppKit `apps init` template registry and several `experimental aitools` flags.

```bash
databricks -v
```

If the command is not found, or the version is below `0.296`, install or upgrade in the next step.

### 2. Install or upgrade the Databricks CLI

Pick the install path for your OS. If the CLI is already installed at an older version, the same commands upgrade in place.

#### macOS / Linux — Homebrew (recommended)

```bash
brew tap databricks/tap
brew install databricks

brew update && brew upgrade databricks
```

#### Windows — WinGet

```bash
winget install Databricks.DatabricksCLI

winget upgrade Databricks.DatabricksCLI
```

Restart your terminal after install.

#### Any platform — curl installer

```bash
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
```

On Windows, run this from WSL. If `/usr/local/bin` is not writable, rerun with `sudo`. Re-running the script also upgrades an existing install.

After installing, confirm the version is `0.296+`:

```bash
databricks -v
```

### 3. Authenticate a profile

Browser-based OAuth is the default for local use:

```bash
databricks auth login
```

The CLI prints a URL and waits for the user to complete OAuth in the browser. **Always show the URL to the user as a clickable link** so they can open it themselves — the CLI does not return until authentication finishes. Credentials save to `~/.databrickscfg`.

If you already know the workspace URL and want to name the profile, do it in one go:

```bash
databricks auth login --host <workspace-url> --profile <PROFILE>
```

`<PROFILE>` is the label you will pass on subsequent commands as `--profile <PROFILE>`. If you skip `--profile`, the CLI uses the `DEFAULT` profile.

For CI/CD, OAuth client credentials or a personal access token are better fits — see the [authentication section of the CLI doc](https://databricks.com/devhub/docs/tools/databricks-cli#authenticate) for the non-interactive flows.

### 4. Verify the handshake

List the saved profiles and confirm the one you just created shows `Valid: YES`:

```bash
databricks auth profiles
```

```text
Name              Host                                           Valid
DEFAULT           https://adb-1234567890.12.azuredatabricks.net  YES
my-prod-workspace https://mycompany.cloud.databricks.com         YES
```

If the row shows `Valid: NO`, the saved token is stale. Re-run `databricks auth login --profile <NAME>` to refresh it. **Never proceed past this step if no profile is `Valid: YES`** — every downstream `databricks` command will fail with an auth error that looks like a template bug.

If the user wants a particular profile to be the default for this shell session, export it:

```bash
export DATABRICKS_CONFIG_PROFILE=<PROFILE>
```

### 5. Smoke-test the CLI against the workspace

Run a read-only API call to confirm the auth actually works (a fresh OAuth token can fail on the first real call if the user picked the wrong workspace in the browser):

```bash
databricks current-user me --profile <PROFILE>
```

A successful response prints the signed-in user's identity. A `401` or `403` here means the auth flow completed against a workspace the user cannot read — re-run `databricks auth login --profile <PROFILE>` and pick the right workspace this time.

---

# The recipe the user copied

The full recipe prompt is below. This is what the user wants to focus on today. Once the local-bootstrap above passes and the intent questions are answered, work through this content step by step.

This template creates a synced table that mirrors a Unity Catalog table into Lakebase Postgres. Verify these Databricks workspace features are enabled before starting.

- **Databricks CLI authenticated.** Run `databricks auth profiles` and confirm at least one profile shows `Valid: YES`. If none do, authenticate with `databricks auth login --host <workspace-url> --profile <PROFILE>`.
- **Lakebase Autoscaling available.** Run `databricks postgres list-projects --profile <PROFILE>` and confirm your Autoscaling project is listed. A `not enabled` error means Lakebase is not available to this identity.
- **Project created via the `/database/` API (not the older `/postgres/` API).** Programmatic synced-table creation via `databricks database create-synced-database-table` only works on projects created through the newer `/database/` API. If your Autoscaling project was created via the older `/postgres/` endpoint, the CLI path in Step 1 is not available yet and you must create synced tables through the Databricks UI (**Catalog** → source table → **Create synced table**). This gap is expected to close in a future release.
- **Unity Catalog source table with a primary key.** Run `databricks tables get <CATALOG>.<SCHEMA>.<SOURCE_TABLE> --profile <PROFILE>` and confirm at least one column is declared as the table's primary key. Synced tables reject sources without a PK.
- **External-storage catalog for the source (currently required for Sync Tables).** Sync Tables today requires the source UC catalog to use external storage. If your source catalog uses the metastore's default managed storage, complete the [Unity Catalog Setup](https://databricks.com/devhub/templates/unity-catalog-setup) template first and move the source table into an external-storage catalog.
- **Change Data Feed enabled on the source table (for Triggered / Continuous mode only).** Skip this check if you plan to use Snapshot mode. Otherwise run the `ALTER TABLE <catalog>.<schema>.<table> SET TBLPROPERTIES (delta.enableChangeDataFeed = true);` statement from Step 1 against your SQL warehouse.

## Sync a Unity Catalog Table to Lakebase

Serve lakehouse data through Lakebase Autoscaling Postgres so your applications can query it with sub-10ms latency. This creates a synced table, a managed copy of your Unity Catalog table in Lakebase that stays up to date automatically.

> This template is for **Lakebase Autoscaling** (projects/branches/endpoints with scale-to-zero). For Lakebase Provisioned (manually scaled instances), see the Provisioned Sync Tables template (coming soon).

### When to use this

- Your app needs fast lookup-style queries against analytics data (user profiles, feature values, risk scores)
- You want to serve gold tables, ML outputs, or enriched records through a standard Postgres connection
- You need ACID transactions and sub-10ms reads alongside your operational state

### Choose a sync mode

| Mode           | Behavior                                       | Best for                                                                              |
| -------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------- |
| **Snapshot**   | One-time full copy                             | Source changes >10% of rows per cycle, or source doesn't support CDF (views, Iceberg) |
| **Triggered**  | Incremental updates on demand or on a schedule | Known cadence of changes, good cost/freshness balance                                 |
| **Continuous** | Real-time streaming (seconds of latency)       | Changes must appear in Lakebase near-instantly                                        |

> **Triggered** and **Continuous** modes require [Change Data Feed (CDF)](https://docs.databricks.com/aws/en/delta/delta-change-data-feed) enabled on the source table. If it's not enabled, run:
>
> ```sql
> ALTER TABLE <catalog>.<schema>.<table> SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
> ```

### Sync throughput

Autoscaling CUs are physically 8x smaller than Provisioned CUs, so per-CU throughput differs:

| Mode                                     | Rows/sec per CU |
| ---------------------------------------- | --------------- |
| **Snapshot** (initial + full refresh)    | ~2,000          |
| **Triggered / Continuous** (incremental) | ~150            |

> A 10x speedup for large-table snapshot sync (writing Postgres pages directly, leveraging separation of storage and compute) is coming for Autoscaling only.

### 1. Create a synced table

```bash
databricks database create-synced-database-table \
  --json '{
    "name": "<CATALOG>.<SCHEMA>.<SYNCED_TABLE_NAME>",
    "database_instance_name": "<INSTANCE_NAME>",
    "logical_database_name": "<POSTGRES_DATABASE>",
    "spec": {
      "source_table_full_name": "<CATALOG>.<SCHEMA>.<SOURCE_TABLE>",
      "primary_key_columns": ["<PRIMARY_KEY_COLUMN>"],
      "scheduling_policy": "<SNAPSHOT|TRIGGERED|CONTINUOUS>",
      "create_database_objects_if_missing": true
    }
  }' --profile <PROFILE>
```

> If your Lakebase database is **registered as a Unity Catalog catalog**, you can omit `database_instance_name` and `logical_database_name`.

Verify:

```bash
databricks database get-synced-database-table <CATALOG>.<SCHEMA>.<SYNCED_TABLE_NAME> --profile <PROFILE>
```

> **Important:** If your Autoscaling project was created via the `/postgres/` API (not `/database/`), programmatic synced table creation is not yet available via CLI. Use the Databricks UI as a fallback. In **Catalog**, select the source table → **Create synced table**, then choose your Lakebase project, branch, sync mode, and pipeline. This gap is expected to close soon.

### 2. Configure pipeline reuse

How you set up pipelines depends on your sync mode:

| Sync mode                | Recommendation                         | Why                                                                                                                  |
| ------------------------ | -------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| **Continuous**           | **Reuse** a pipeline across ~10 tables | Cost-advantageous (e.g., 1 pipeline for 10 tables ≈ $204/table/month vs $2,044/table/month for individual pipelines) |
| **Snapshot / Triggered** | **Separate** pipelines per table       | Allows re-snapshotting individual tables without impacting others                                                    |

### 3. Schedule ongoing syncs

The initial snapshot runs automatically on creation. For **Snapshot** and **Triggered** modes, subsequent syncs need to be triggered.

> **Note:** Table-update triggers for sync pipelines are not yet available via CLI and must be configured through the Databricks UI: **Workflows** → create/open a job → add a **Database Table Sync pipeline** task → **Schedules & Triggers** → add a **Table update** trigger pointing to your source table.

Trigger a sync update programmatically via the Databricks SDK:

```python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

table = w.database.get_synced_database_table(
    name="<CATALOG>.<SCHEMA>.<SYNCED_TABLE_NAME>"
)
pipeline_id = table.data_synchronization_status.pipeline_id

w.pipelines.start_update(pipeline_id=pipeline_id)
```

### 4. Query the synced data in Postgres

Once synced, the table is available in Lakebase Postgres. The Unity Catalog schema becomes the Postgres schema:

```sql
SELECT * FROM "<schema>"."<synced_table_name>" WHERE "user_id" = 12345;
```

Connect with any standard Postgres client (psql, DBeaver, your application's Postgres driver).

### What you end up with

- A **synced table** in Unity Catalog that tracks the sync pipeline
- A **read-only Postgres table** in Lakebase that your apps can query with sub-10ms latency
- A **managed Lakeflow pipeline** that keeps the data in sync based on your chosen mode
- Up to **16 connections** per sync to your Lakebase database

### Important constraints

- **Primary key is mandatory.** Synced tables always require a primary key. It enables efficient point lookups and incremental updates. Rows with nulls in PK columns are excluded from the sync.
- **Duplicate primary keys fail the sync** unless you configure a `timeseries_key` for deduplication (latest value wins per PK). Using a timeseries key has a performance penalty.
- **Schema changes**: For Triggered/Continuous mode, only **additive** changes (e.g., adding a column) propagate. Dropping or renaming columns requires recreating the synced table.
- **FGAC tables**: Direct sync of Fine-Grained Access Control tables fails. **Workaround**: create a view (`SELECT * FROM table`), then sync the view in Snapshot mode. Caveat: runs as the sync creator and only sees their visible rows.
- **Connection limits**: Autoscaling supports up to 4,000 concurrent connections (varies by compute size). Each sync uses up to 16 connections.
- **Read-only in Postgres**: Synced tables should only be read from Postgres. Writing to them interferes with the sync pipeline.

### Cost guidance

Cost formula: `[Rows / (Speed × CUs × 3600)] × DLT Hourly Rate`

Example costs (181M rows, 1 CU, $2.80/hr DLT rate):

| Mode                               | Monthly cost |
| ---------------------------------- | ------------ |
| Snapshot (daily)                   | ~$2,110      |
| Triggered (daily, 5% changes)      | ~$1,407      |
| Continuous (10 tables, 1 pipeline) | ~$204/table  |
| Continuous (1 table, 1 pipeline)   | ~$2,044      |

### Troubleshooting

| Issue                               | Fix                                                                                       |
| ----------------------------------- | ----------------------------------------------------------------------------------------- |
| CDF not enabled warning             | Run `ALTER TABLE ... SET TBLPROPERTIES (delta.enableChangeDataFeed = true)` on the source |
| Schema not visible in create dialog | Confirm you have `USE_SCHEMA` and `CREATE_TABLE` on the target schema                     |
| Null bytes in string columns        | Clean source data: `SELECT REPLACE(col, CAST(CHAR(0) AS STRING), '') AS col FROM table`   |
| Sync failing                        | Check the pipeline in the synced table's Overview tab for error details                   |
| FGAC table sync fails               | Create a view over the table and sync the view in Snapshot mode                           |
| Duplicate primary key failure       | Add a `timeseries_key` to deduplicate (latest wins)                                       |

#### References

- [Synced tables (Autoscaling)](https://docs.databricks.com/aws/en/oltp/projects/sync-tables)
- [Change Data Feed](https://docs.databricks.com/aws/en/delta/delta-change-data-feed)
- [Lakebase Autoscaling](https://docs.databricks.com/aws/en/oltp/projects/)
- [DevHub: Data Lakehouse overview](https://databricks.com/devhub/docs/lakehouse/overview)
