Baseball moves fast, defined by small moments: one pitch, one matchup, one decision. This story follows how a modern clubhouse uses Databricks to turn high-fidelity pitch data into decisions that help win games.

The hitters file into the video room. The coach does not want a 30‑page printout; they want a crisp plan for tonight’s starter.
Earlier that day, the analyst sat at their laptop and opened Genie, on top of Unity Catalog, where Statcast and team‑derived tables live with consistent schemas, permissions, and lineage. They asked:
“For tonight’s starter, show first‑pitch mix and locations to our right‑handed hitters and left‑handed hitters over the last two seasons. Highlight trends when runners are on base.”
Genie compiled the answer from governed Delta tables in Unity Catalog. As part of that work, the analyst also registered a set of Unity Catalog SQL functions that encapsulate the key queries, such as tendencies by count, hand, and base‑runner state, so they can reuse them in future planning and in automated agents.
The analyst exported the results into a simple one‑pager the staff could print or include in hitters’ binders. The key points were:
The hitting coach walks into the meeting with three clear talking points. By the time players head to batting practice, the first two trips through the order are not guesses; they are anchored in a shared view of how tonight’s starter actually pitches.
The staff knows there will be a point in most games when the starter is near 100 pitches and the heart of the order is coming up. The choice between a sinkerballer and a slider‑first righty will feel like a gut call in the moment, but the work happens earlier.
In the clubhouse before the series, the analyst uses a Multi-Agent Supervisor, built with Agent Bricks and deployed on Model Serving, to simulate the pockets the staff cares about: heart of the order in the sixth, bottom third in the seventh, lefty‑heavy clusters in the late innings.
For each decision, the agent:
The analyst turns this into a short bullpen card. For example:
The staff prints the card and reviews it together. When the actual sixth‑inning situation appears during the game, no one is logging into Databricks. The pitching coach is following a decision tree the staff already pressure‑tested with the agent hours before.
Pinch‑hit choices in the eighth inning are rehearsed the same way.
As part of pre‑game prep, the analyst asks the Databricks agent:
“For the likely late‑inning relievers we will see in this series, rank our bench bats by expected outcome, and explain when each is the better option.”
The agent calls the same UC functions and Delta tables in Unity Catalog to:
The analyst drops these recommendations into the manager’s game card or a small one‑page “pinch‑hit grid” that can be reviewed in advance. Once the game starts, the card becomes the reference point. The manager is choosing between options they have already walked through, with the data distilled into a format that respects league rules about devices in the dugout.
On the off day between series, the analyst turns from single‑game tactics to what is coming next. Two upcoming starters have limited direct history against the lineup.
Back in Genie, they ask:
“Find pitchers whose arsenals and movement profiles are most similar to our upcoming starters, then show how our lineup has fared against those comparable arms.”
Here, Genie hands part of the job to Databricks Vector Search. Pitcher and hitter embeddings, stored in Unity Catalog from prior processing, are indexed so the system can find “similar pitchers” without guessing by eye.
The workflow is:
When head‑to‑head Statcast history is thin, this combination of Vector Search and Genie gives the staff a way to say, “Here is how we have hit pitchers who look like this,” and bake that into the series plan. Those insights are then exported into the advance report, ready for the next road meeting.
Winning seasons are built on more than one game. The GM and analysts use the same platform to make calls about value, fit, and risk.
In Genie, they explore questions like:
“Show how our number three starter’s profile plays against the top lineups in our division by count and hand. Where does his value come from, and where are we exposed?”
“For left‑handed bats around the league, identify players whose strengths match up with how our division is pitched in late innings.”
These questions are answered directly from the lakehouse in Unity Catalog. Pitch‑level data, embeddings, and derived features are all governed in one place. Genie turns them into natural‑language answers, but under the hood the logic is still reusable UC SQL functions.
Meanwhile, the baseball operations app that coaches, scouts, and the front office use is backed by Lakebase Postgres. That app is where:
Because Lakebase Postgres is part of the Databricks platform, app state is kept close to the source data:
The result is shared memory. What happened, why it happened, and how it was justified are stored in one place, with timestamps and user identity.
All of that only matters if the numbers are right. By running these agents and apps on top of a single governed lakehouse instead of scattered one‑off tools, clubs can see that the logic matches the work they already do and lean on it in big spots. When the data points to a specific matchup or move, it feels like an extension of the game plan, not a black box.
Learn more about Databricks Sports, or request a demo to see how your organization can drive competitive insights.
