Introduction
Marketing teams frequently encounter challenges in accessing their data, often depending on technical teams to translate that data into actionable insights. To bridge this gap, our Databricks Marketing team adopted AI/BI Genie - an LLM-powered, no-code experience that allows marketers to ask natural language questions and receive reliable, governed answers directly from their data.
What started as a prototype serving 10 users for one focused use case has evolved into a trusted self-service tool used by over 200 marketers handling more than 800 queries per month. Along the way, we learned how to turn a simple prototype into a trusted self-service experience.
The Rise of “Marge”
Our Marketing Genie, affectionately named “Marge”, started as an experiment before the 2024 Data + AI Summit. Thomas Russell, Senior Marketing Analytics Manager, recognized Genie’s potential and configured a Genie space with relevant Unity Catalog tables, including customer accounts, program performance, and campaign attribution.
The image above shows our Marketing Genie “Marge” in action. While the data has been sanitized, it should give you the general idea.
Since launch, Marge has become a go-to resource for marketers who need fast, reliable insights—without depending on analytics teams. We see Genie in a similar light: like a smart intern who can deliver great results with guidance but still needs structure for more complex tasks. With that perspective, here are five key lessons that helped shape Genie into a powerful tool for marketing.
Lesson 1: Start small and focused
When creating a Genie space, it’s tempting to include all available data. However, starting small and focused is key to building an effective space. Think of it this way: fewer data points mean less chance of error for Genie. LLMs are probabilistic, meaning that the more options they have, the greater the chance of confusion.
So what does this mean? In practical terms:
- Select only relevant tables and columns: Include the fewest tables and columns needed to address the initial set of questions you want to answer. Aim for a cohesive and manageable dataset rather than including all tables in a schema.
- Iteratively expand tables and columns: Begin with a minimal setup and expand iteratively based on user feedback. Incorporate additional tables and columns only after users have identified a need for more data. This helps streamline the process and ensures the space evolves organically to meet real user needs.
Example: Our first marketing use case involved analyzing email campaign performance, so we started by including only tables with email campaign data, such as campaign details, recipient lists, and engagement metrics. We then expanded slowly to include additional data, like account details and campaign attribution, only after users provided feedback requesting more data.
Lesson 2: Annotate and document your data thoroughly
Even the smartest data analyst in the world would struggle to deliver insightful answers without first understanding your specific business concepts, terminology, and processes. For example, if a term like "Q1" means March through May for your team instead of the standard calendar definition, the most skilled expert would still need clear guidance to interpret it correctly. Genie operates in much the same way—it’s a powerful tool, but to perform at its best, it needs clear context and well-documented data to work from. Proper annotation and documentation are critical for this purpose. This includes:
- Define your data model (primary and foreign keys): Adding primary and foreign key relationships directly to the tables will significantly enhance Genie’s ability to generate accurate and meaningful responses. By explicitly defining how your data is connected, you help Genie understand how tables relate to one another, enabling it to create joins in queries.
- Embrace Unity Catalog for your metadata: Utilize Unity Catalog to manage your descriptive metadata effectively. Unity Catalog is a unified governance solution that provides fine-grained access controls, audit logs, and the ability to define and manage data classifications and descriptions across all data assets in your Databricks environment. By centralizing metadata management, you ensure that your data descriptions are consistent, accurate, and easily accessible.
- Leverage AI-generated comments: Unity Catalog can leverage AI to help generate initial metadata descriptions. While this automation speeds up the documentation process, final descriptions must be reviewed, modified, and approved by knowledgeable humans to ensure accuracy and relevance. Otherwise, inaccurate or incomplete metadata will confuse the Genie.
- Provide detailed business context: Beyond basic descriptions, annotations should provide business context to your data. This means explaining what each metric represents in terms that align with your organization's terminology and business processes. For instance, if “open_rate” refers to the percentage of recipients who opened an email, this should be clearly included in the column description. Adding some example values from the data is also extremely helpful.
Example: Create a column annotation for campaign_country
with the description “Values are in the format of ISO 3166-1 alpha-2, for example: ‘US’, ‘DE’, ‘FR’, ‘BR’.” This will help the Genie know to use “DE” instead of “Germany” when it creates queries.
Lesson 3: Provide clear example queries, trusted assets, and text instructions
Effective implementation of a Databricks Genie space relies heavily on providing example SQL, leveraging trusted assets and clear text instructions. These techniques ensure accurate translation of natural language questions into SQL queries and consistent, reliable responses.
By combining clear instructions, example queries, and the use of trusted assets, you provide Genie with a comprehensive toolkit to generate accurate and reliable insights. This combined approach ensures that our marketing team can depend on Genie for consistent data insights, enhancing decision-making and driving successful marketing strategies.
Tips for adding effective instructions:
- Start small: Focus on essential instructions initially. Avoid overloading the space with too many instructions or examples upfront. A small, manageable number of instructions ensures the space remains efficient and avoids token limits.
- Be iterative: Add detailed instructions progressively based on real user feedback and testing. As you refine the space and identify gaps (e.g., misunderstood queries or recurring issues), introduce new instructions to address these specific needs instead of trying to preempt everything.
- Focus and clarity: Ensure that each instruction serves a specific purpose. Redundant or overly complex instructions should be avoided to streamline processing and improve response quality.
- Monitor and adjust: Continuously test the space’s performance by examining generated queries and collecting feedback from business users. Incorporate additional instructions only where necessary to improve accuracy or address shortcomings.
- Use general instructions: Some examples of when to leverage general instructions include:
- To explain domain-specific jargon or terminology (e.g., “What does fiscal year mean in our company?”).
- To clarify default behaviors or priorities (e.g., “When someone asks for 'top 10,' return results by descending revenue order.”).
- To establish overarching guidelines for interpreting general types of queries. For example:
- “Our fiscal year starts in February, and 'Q1' refers to February through April.”
- “When a question refers to 'active campaigns,' filter for campaigns with status = 'active' and end_date >= today.”
- Add example queries: We found that example queries offer the greatest impact when used as follows:
- To address questions that Genie is unable to answer correctly based on table metadata alone.
- To demonstrate how to handle derived concepts or scenarios involving complex logic.
- When users often ask similar but slightly variable questions, example queries allow Genie to generalize the approach.
The following is a great use case for an example query:
- User Question: “What are the total sales attributed to each campaign in Q1?”
- Example SQL Answer:
- Leverage trusted assets: Trusted assets are predefined functions and example queries designed to provide verified answers to common user questions. When a user submits a question that triggers a trusted asset, the response will indicate it — adding an extra layer of assurance about the accuracy of the results. We found that some of the best ways to use trusted assets include:
- For well-established, frequently asked questions that require an exact, verified answer.
- In high-value or mission-critical scenarios where consistency and precision are non-negotiable.
- When the question warrants absolute confidence in the response or depends on pre-established logic.
The following is a great use case for a trusted asset:
- Question: “What were the total engagements in the EMEA region for the first quarter?
- Example SQL Answer (With Parameters):
- Example SQL Answer (Function):
Lesson 4: Simplify complex logic by preprocessing data
While Genie is a powerful tool capable of interpreting natural language queries and translating them into SQL, it's often more efficient and accurate to preprocess complex logic directly within the dataset. By simplifying the data Genie has to work with, you can improve the quality and reliability of the responses. For example:
- Preprocess complex fields: Instead of giving Genie instructions or examples to parse complex logic, create new columns that simplify the interpretation process.
- Boolean columns: Use Boolean values in new columns to represent complex states. This makes the data more explicit and easier for Genie to understand and query against.
- Prejoin tables: Instead of using multiple, normalized tables that need to be joined together, pre-join these tables in a single, denormalized view. This eliminates the need for Genie to infer relationships or construct complex joins, ensuring all relevant data is accessible in one place and making queries faster and more accurate.
- Leverage Unity Catalog Metric Views (coming soon): Use metric views in Unity Catalog to predefine key performance metrics, such as conversion rates or customer lifetime value. These views ensure consistency by centralizing the logic behind complex calculations, allowing Genie to deliver trusted, standardized results across all queries that reference these metrics.
Example: Let's say there is a field called event_status
with the values "Registered - In Person," "Registered - Virtual," "Attended - In Person," and "Attended - Virtual." Instead of instructing Genie on how to parse this field or providing numerous example queries, you can create new columns that simplify this data:
is_registered
(True if the event_status includes 'Registered')
is_attended
(True if the event_status includes 'Attended')
is_virtual
(True if the event_status includes 'Virtual')
- is_inperson (True if the event_status includes 'In Person')
Lesson 5: Continuous feedback and refinement
Setting up Genie spaces is not a one-time task. Continuous refinement based on user interactions and feedback is crucial for maintaining accuracy and relevance.
- Monitor interactions: Use Genie’s monitoring tools to review user interactions and identify common points of confusion or error. Encourage users to actively contribute feedback by responding to the prompt “Is this correct?” with “Yes,” “Fix It” or “Request Review.” Further, encourage users to supplement those responses with detailed comments on where improvements or further investigation is needed. This feedback loop is essential for continually refining the Genie space and ensuring that it evolves to better meet the needs of your marketing team.
- Incorporate feedback: Regularly update the space with updated table metadata, example queries, and new instructions based on user feedback. This iterative process helps Genie improve over time.
- Build and run benchmarks: These enable systematic accuracy evaluations by comparing responses to predefined "gold-standard" SQL answers. Running these benchmarks after data or instruction updates identifies where the Genie is getting better or worse, guiding targeted refinements. This iterative process ensures reliable insights and helps maintain the alignment of Genie spaces with evolving business needs.
Example: If users frequently get incorrect results when querying segment-specific data, update the instructions to better define segmentation logic and refine the corresponding example queries.
Conclusion
Implementing an effective Databricks AI/BI Genie tailored for marketing insights or any other business use case involves a focused, iterative approach. By starting small, thoroughly documenting your data, providing clear instructions and example queries, leveraging trusted assets, and continuously refining your space based on user feedback, you can maximize the potential of Genie to deliver high-quality, accurate answers.
Following these strategies within the Databricks marketing organization, we were able to drive significant improvements. Our Genie usage grew nearly 50% quarter over quarter, while the number of flagged incorrect responses dropped by 25%. This has empowered our marketing team to gain deeper insights, trust the answers, and make data-driven decisions confidently.
Want to learn more?
If you would like to learn more about this use case, you can join Thomas Russell in person at this year’s Data and AI Summit in San Francisco. His session, “How We Turned 200+ Business Users Into Analysts With AI/BI Genie,” is one you won’t want to miss—be sure to add it to your calendar!
In addition to the key learnings from this blog, there are tons of other articles and videos already published to help you learn more about AI/BI Genie best practices. You can check out the best practices recommended in our product documentation. On Medium, there are a number of blogs you can read, including:
If you prefer to watch rather than read, you can check out these YouTube videos:
You should also check out the blog we created entitled Onboarding your new AI/BI Genie.
If you are ready to explore and learn more about AI/BI Genie and Dashboards in general, you can choose any of the following options:
- Free Trial: Get hands-on experience by signing up for a free trial.
- Documentation: Dive deeper into the details with our documentation.
- Webpage: Visit our webpage to learn more.
- Demos: Watch our demo videos, take product tours and get hands-on tutorials to see these AI/BI in action.
- Training: Get started with free product training through Databricks Academy.
- eBook: Download the Business Intelligence meets AI eBook.
Thanks for reading this far and watch out for more great AI/BI content coming soon!