Bringing Predictive Intelligence to Conversational BI with Genie, TabPFN, and Agent Bricks
by Ryuta Yoshimatsu , Javier Poveda Panter, Dominik Safaric, Philipp Singer, Diana Kriuchkova, Sauraj Gambhir, Dael Williamson and Bryan Smith
Business intelligence has always been about answering questions. For most organizations, those questions have been descriptive — what happened last quarter? — or diagnostic — why did churn spike in the Southeast? Databricks Genie has made these questions radically more accessible, enabling business users to get answers in natural language without writing SQL or waiting on an analyst.
But the questions that drive the most consequential decisions are predictive. Which customers are likely to churn next quarter? How will demand shift if we adjust pricing? How likely is this loan applicant to default? Answering these has historically required an entirely different set of tools, skills, and teams — a data scientist exploring the data, validating its fitness for prediction, engineering features, training a model, and maintaining that model as conditions change. The result: a hard boundary between the BI world, where business users operate with confidence, and the predictive analytics world, where only specialized teams can tread.
In a previous blog post, we showed how TabPFN — a foundation model for tabular data from Prior Labs — collapses much of that predictive workflow by delivering production-grade predictions in a single forward pass. But a key bottleneck remained: someone still needed to translate the business question into a well-formed dataset before TabPFN could make a prediction. The model may be instant, but the work that feeds it is not.
This is where Genie's role shifts from answering questions to enabling predictions. Genie already understands an organization's data — its schemas, relationships, and business semantics. By combining Genie with TabPFN within a multi-agent orchestrator, we create a closed loop: Genie dynamically translates a natural language question into the precise input data TabPFN needs, and TabPFN transforms that data into a prediction in a single forward pass. Every predictive question asked during the conversation received a tailored response on the fly. The space of questions you can answer becomes essentially unbounded — any question that can be framed as "given historical data with an outcome, predict an outcome for a new scenario" can be answered in seconds.
The result is a single, governed experience — grounded in Lakehouse data with full lineage and access control through Unity Catalog — where business users ask predictive questions in the same conversational interface they use for descriptive analytics.
In this post, we walk through the application architecture that makes this possible, introducing each technical component and showing how they come together to deliver predictive intelligence directly within conversational BI.
Video 1. Interacting with a multi-agent supervisor with Genie and TabPFN via a Databricks Apps interface
The system is built as a multi-agent orchestrator deployed as a Databricks App, which connects the primary components using Agent Bricks, a platform for building and deploying enterprise agents on Databricks. Genie acts as a subagent for structured SQL analytics over governed Lakehouse data. TabPFN is connected to Unity Catalog as an external MCP server. The system also supports additional subagents and serving endpoints; other Databricks applications, or additional MCP servers, can be added as needed.
When a predictive question arrives, the orchestrator executes an agentic workflow. It interprets the user’s business intent. If answering the question requires predictive analysis, it queries Genie to extract the appropriate labeled data from the Lakehouse. After it has gathered all necessary data, it calls TabPFN, passing this data to the model in the right format. Finally, the supervisor interprets the predictions and delivers an actionable recommendation to the user (Figure 1).

To make this concrete, consider what happens when a sales leader asks: "Which promotion type would most likely close the Horton-Cross deal?"
In a traditional workflow, answering this question requires a data scientist to understand the question and identify which tables and columns matter; extract the right training set from historical deals that include promotion types and win/loss outcomes; select an algorithm, tune hyperparameters, and validate performance; prepare inference data specific to the Horton-Cross deal; run the model; and translate the output into a business recommendation. Each of these steps takes time, expertise, and iteration. And the next question — "What is the optimal date to follow up to maximize win probability?" — requires an entirely different model built from scratch.
Now consider what happens with Genie and TabPFN under the same multi-agent supervisor. The supervisor interprets the natural language question and its semantic intent, then translates that intent into a specific request for Genie to generate a dataset. Genie recognizes that answering this question requires historical opportunities joined with promotions and accounts, using win or loss as the label, and generates precise SQL to extract this data instantly.
TabPFN receives that dataset and generates predictions in a single forward pass — no feature preprocessing, no model selection, no hyperparameter tuning. Finally, the supervisor returns a clear, data-driven recommendation. The entire pipeline — from question to prediction — assembles itself from natural language in a single conversation turn.
The pattern has limitations: TabPFN is only as good as the data Genie produces. If Genie cannot construct a meaningful dataset with a clear label column for a given question, because the schema does not capture the right signal, the necessary joins do not exist, or the outcome is not represented in the data, then the prediction will not be reliable, regardless of how capable TabPFN is. See the best practices for building an effective Genie space here. On top of this, there is also a broader risk that an agent may hallucinate or omit key information during a multi-turn conversation.
That is exactly why systematic evaluation is essential. Unlike a static ML pipeline that must be validated once before deployment, this system dynamically constructs a distinct ML problem for each question. We need an evaluation framework to understand where the boundary lies: which classes of questions produce reliable predictions, and which ones exceed what Genie can express as a well-formed training set.
The solution accelerator ships with a comprehensive evaluation harness built on MLflow’s GenAI evaluation framework. It runs against the live agent and logs results to MLflow Experiment Tracking, giving teams a single pane of glass to evaluate and monitor quality over time. You can find the full details here.
Video 2. Evaluating a multi-agent supervisor with Genie and TabPFN via Databricks Experiments interface.
Without this evaluation loop, the system may confidently return predictions with no way to distinguish trustworthy from unreliable ones. This rigorous approach ensures coverage at every level: it catches conversational and behavioral regressions while also validating end-to-end correctness of the predictive pipeline. Together, these checks give teams the confidence to deploy this pattern in production, with a clear understanding of which question classes produce reliable predictions and where the system boundaries lie.
The combination of Genie, TabPFN, and Agent Bricks reframes the relationship between descriptive and predictive analytics. Genie becomes the feature engineering layer. TabPFN removes the training and maintenance overhead. Agent Bricks provides the orchestration and governance backbone, while MLflow evaluates and monitors the quality of the responses. The result is that business users can ask predictive questions in the same conversational interface they already use for descriptive analytics.
The full Solution Accelerator is available here. The repository includes sample data generation, Genie Space configuration and the end-to-end evaluation harness described above. The pattern is domain-agnostic: while the accelerator demonstrates enterprise sales analytics, the same architecture applies to any domain where structured data with outcomes exists, including healthcare risk scoring, manufacturing quality prediction, financial fraud detection, customer churn analysis, and beyond.
Get started today and bring predictive intelligence to the conversations your teams are already having.
Subscribe to our blog and get the latest posts delivered to your inbox.