Unite your Patient’s Data with Multi-Modal RAG

Understand how to use Databrick’s Vector Search with multi-modal embeddings to enhance your RAG applications with multi-modal capabilities

Code displaying data analytics and AI embedding concepts.

Published: June 6, 2025

Summary

Cross-Modal Superpowers: Multimodal embedding models let you search across different modalities using text queries without tags or metadata by placing different data types in a shared semantic space.
Build, Search, Deploy: Learn to create a complete multimodal RAG system on Databricks by converting images to embeddings, storing them in Delta tables, and enabling powerful similarity search with Vector Search.
Seamless Content Connections: Enhance your RAG applications by bridging the gap between modalities, making even complex documents like PDFs with mixed content types fully searchable and contextually relevant.

Introduction: Diverse Data Enables AI

Multimodal retrieval represents a significant challenge in modern AI systems. Traditional retrieval systems struggle to effectively search across different data types without extensive metadata or tagging. This is particularly problematic for healthcare companies that manage large volumes of diverse content, including text, images, audio, and more, often resulting in unstructured data sources.

An image comparing unstructured and structured data. — Figure 1: These sources do not communicate well together, especially if they originate from different system

Anyone working in healthcare understands the difficulty of merging unstructured data with structured data. A common example of this is clinical documentation, where handwritten clinical notes or discharge summaries from patients are often submitted in PDFs, images, and similar formats. This needs to be either converted manually or processed using Optical Character Recognition (OCR) to find the necessary information. Even after this step, you must map the data to your existing structured data to utilize it effectively.

For this blog, we will review the following:

How to load open source multi-modal models to Databricks
Use the open source model to generate embeddings on unstructured data
Store these embeddings in a Vector Search Index (AWS | Azure | GCP)
Use Genie Spaces (AWS | Azure | GCP) to query our structured data
Use DSPy to create a multi-tool calling agent that uses the Genie Space and Vector Search Index to respond to the input

By the end of this blog, you will see how multi-modal embeddings enable the following for healthcare:

More diverse data by using everything in a PDF, not just the text
The flexibility to use any data together. In healthcare, this is especially valuable since you may not know what kind of data you will need to work with
Unification of data through an Agent, allowing for a more comprehensive answer

Figure 2: A conceptual idea of how your data can be unified through Agents and Multi-Modal Embeddings

What is an Embedding?

An embedding space (AWS | Azure | GCP) is an n-dimensional mathematical representation of records that allows one or more data modalities to be stored as vectors of floating-point numbers. What makes that useful is that in a well-constructed embedding space, records of similar meaning occupy a similar space. For example, imagine we had a picture of a horse, the word “truck”, and an audio recording of a dog barking. We pass these three completely different data points into our multimodal embedding model and get back the following:

Horse picture: [0.92, 0.59, 0.17]
“Truck”: [0.19, 0.93, 0.81]
Dog barking: [0.94, 0.11, 0.86]

Here is a visual representation of where the numbers would exist in an embedding space:

A 2D representation of high-dimensional data points in an embedding space.

In practice, embedding space dimensions will be in the hundreds or thousands, but for illustration, let’s use 3-space. We can imagine the first position in these vectors represents “animalness,” the second is “transportation-ness,” and the third is “loudness.” That would make sense given the embeddings, but typically, we do not know what each dimension represents. The important thing is that they represent the semantic meaning of the records.

There are several ways to create a multimodal embedding space, including training multiple encoders simultaneously (such as CLIP), using cross-attention mechanisms (such as DALL-E), or using various post-training alignment methods. These methods allow the record's meaning to transcend the original modality and occupy a shared space with other disparate records or formats.

This shared semantic space is what enables powerful cross-modal search capabilities. When a text query and an image share similar vector representations, they likely share similar semantic meanings, allowing us to find relevant images based on textual descriptions without explicit tags or metadata.

Multimodal Embedding Models: Sharing Embedding Spaces

To effectively implement multimodal search, we need models that can generate embeddings for different data types within a shared vector space. These models are specifically designed to understand the relationships between different modalities and represent them in a unified mathematical space.

Several powerful multimodal embedding models are available as of June 2025:

Cohere’s Multimodal Embed 4: A versatile model that excels at embedding both text and image data with high accuracy and performance.
Nomic-Embed: Offers strong capabilities for embedding various data types in a unified space. It is one of the few fully open source models.
Meta ImageBind: An impressive model that can handle six different modalities, including images, text, audio, depth, thermal, and IMU data.
CLIP (Contrastive Language-Image pretraining): Developed by OpenAI, CLIP is trained on a diverse range of image-text pairs and can effectively bridge the gap between visual and textual data.

Key Architectural Considerations

At Databricks, we provide the infrastructure and tools to host, evaluate, and develop an end-to-end solution, customizable to your use case. Consider the following scenarios as you begin deploying this use case:

Scalability & Performance

Processing options must be selected based on dataset size: in-memory processing for smaller datasets or development work, versus Model Serving (AWS | Azure | GCP) for production workloads requiring high throughput
Databricks Vector Storage Optimized endpoints vs. Standard endpoints (AWS | Azure | GCP). If you have a lot of vectors, consider storage optimized to store more vectors (around 250M+)

Cost Considerations

For large-scale implementations, serving embedding models and using AI Query (AWS | Azure | GCP) for batch inference is more efficient than in-memory processing.
Determine if you need a triggered or continuous update for your Vector Search Index (AWS | Azure | GCP)
Again, consider Storage Optimized endpoints vs. Standard endpoints.
You can track these costs with the Serverless Real-time Inference SKU
Consider using Budget Policies (AWS | Azure | GCP) to ensure you are accurately tracking your consumption

Operational Excellence

Use pipelines and workflows (AWS | Azure | GCP) and Databricks Asset Bundles (AWS | Azure | GCP) on Databricks to detect changes in source data and update embeddings accordingly
Use Vector Search Delta Sync (AWS | Azure | GCP) to completely automate syncing to your index, no management of pipelines needed
Vector Search handles failures, retries, and optimizations automatically to ensure reliability.

Network and Security Considerations

Use Databricks Compliance Profiles (AWS | Azure | GCP) for HIPAA compliance in your workspace
Use Databricks Secret Manager or Key Management Systems to manage your secrets
Please review the trust and safety explanation (AWS | Azure | GCP) in our documentation for how Databricks handles your data for AI managed services.

Technical Solution Breakdown

For the full implementation of this solution, please visit this repo here: Github Link

This example will take synthetic patient information as our structured data and sample explanations of benefits in PDF format as our unstructured data. First, synthetic data is generated to use with a Genie Space. Then Nomic multi-modal embedding model, a state-of-the-art open source multi-modal embedding model, is loaded onto Databricks Model Serving to generate embeddings on sample explanations of benefits found online.

This process sounds complicated, but Databricks provides built-in tools that enable a complete, end-to-end solution. At a high level, the process looks like the following:

Ingestion via Autoloader (AWS | Azure | GCP)
ETL with Lakeflow Declarative Pipelines (AWS | Azure | GCP)
Creation of Embeddings with multi-modal embedding models hosted on Databricks
Host the embeddings in a Vector Search index (AWS | Azure | GCP)
Serving with Model Serving (AWS | Azure | GCP)
Agent Framework to secure, deploy, and govern the Agent

Genie Space Creation

This Genie Space will be used as a tool to convert natural language into an SQL query to query our structured data.

Step 1: Generate Synthetic Patient Data

In this example, the Faker library will be used to generate random patient information. We will create two tables to diversify our data: Patient Visits and Practice Locations, with columns such as reasons for visit, insurance providers, and insurance types.

Step 2: Create a Patient Information Genie Space

To query data using natural language, we can utilize a Databricks Genie Spaces (AWS | Azure | GCP) to convert our query into natural language and retrieve relevant patient data. In the Databricks UI, simply click the Genie tab in the left bar → New → select patient_visits and practice_locations tables.

Business users interacting with data using natural language in a Genie space.

We need the Genie Space ID to capture the number that comes after rooms. You can see an example below:

Step 3: Create the function that will represent the Genie Tool our Agent will use.

Since we are using DSPy, all we need to do is define a Python function.

That’s it! Let’s set up the Multi-Modal Generation workflow now.

Multi-Modal Embedding Generation

For this step, we will use the fully open colNomic-embed-multimodal-7b model on HuggingFace to generate embeddings for our unstructured data, in this case, PDFs. We selected Nomic’s model due to its Apache 2.0 license and high performance on benchmarks.

The method for generating your embeddings will vary depending on your use case and modality. Review the Databricks Vector Search Best Practices (AWS | Azure | GCP) to understand what is best for your use case.

Step 1: Load, Register, and Serve the model on Databricks

We need this model to be available within Databricks Unity Catalog (UC), so we will use MLflow to load it from Huggingface and register it. Then, we can deploy the model to a model-serving endpoint.

The Python model includes additional logic to handle image inputs, which can be found in the complete repository.

UC Volumes are designed like file systems to host any file and are where we store our unstructured data. You can use them in the future to store other files, such as images, and repeat the process as needed. This includes the model above. In the repository, you will see that the cache refers to a volume.

Step 2: Load our PDFs into a list

You will have a folder called sample_pdf_sbc containing some example summaries of benefits and coverage. We need to prepare these PDFs to embed them.

Step 3: Convert your PDFs to Images to be embedded by the colNomic model.

The colNomic-embed-multimodal-7b model is specifically trained to recognize text and images within one image, a common input from PDFs. This allows the model to perform exceptionally well in retrieving these pages.

This method enables you to utilize all content within a PDF without needing a text chunking strategy to ensure retrieval works effectively. The model itself can embed these images well in their own embedding space.

We will use pdf2image to convert each page of the PDF into an image, preparing it for embedding.

Step 4: Generate the Embeddings

Now that we have the PDF images, we can generate the embeddings. At the same time, we can save the embeddings to a Delta table with additional columns that we will retrieve alongside our Vector Search, like the file path to the Volume location.

Step 5: Create a Vector Search Index and Endpoint

Creating a Vector Search index can be done via UI or API. The API method is shown below.

Now we just need to tie it all together with an Agent.

Uniting the Data with DSPy

We use DSPy for this because of its declarative, pure Python design. It allows us to iterate and develop quickly, testing various models to see which ones will work best for our use case. Most importantly, the declarative nature allows us to modularize our Agent so that we can isolate the Agent’s logic from the tools and focus on defining HOW the agent should accomplish its task.

And the best part? No manual prompt engineering!

Step 1: Define your dspy.Signatures

This signature specifies and enforces the inputs and outputs, while also explaining how the signature should function.

Step 2: Add your signature to a dspy.module

The module will take the instructions from the signature and create an optimal prompt to send to the LLM. For this particular use case, we will build a custom module called `MultiModalPatientInsuranceAnalyzer()`.

This custom module will break out the signatures as steps, almost like “chaining” together calls, in the forward method. We follow this process:

Take the defined signatures above and initialize them in the class
Define your tools. They only need to be a Python function
1. For this post, you will create a Vector Search tool and a Genie Space tool. Both of these tools will use the Databricks SDK to make an API call to these services.
2. Define your logic in the forward method. In our case, we know we need to extract keywords, hit the Vector Search index, then pass everything into the final LLM call for a response.

Step 3: All done! Now run it!

Review what tools the Agent used and the reasoning the Agent went through to answer the question.

Next Steps

Once you have a working Agent, we recommend the following:

Use the Mosaic AI Agent Framework to deploy your agents to agent endpoints and manage/version them using Unity Catalog
Use the Mosaic AI Agent Eval Framework (AWS | Azure | GCP) to do evaluations and ensure your agents are performing to your expectations

The evaluation framework will be crucial in understanding how effectively the Vector Search index retrieves relevant information for your RAG agent. By following these metrics, you will know where to make adjustments, from changing the embedding model to adjusting the prompts interacting with the LLM.

You should also monitor to see if the Foundation Model API (AWS | Azure | GCP) is enough for your use case. At a certain point, you will reach API limits for the Foundation Model APIs, so you will need to transition to Provisioned Throughput (AWS | Azure | GCP) to have a more reliable endpoint for your LLM.

Furthermore, keep a close eye on your costs against serverless model serving (AWS | Azure | GCP). Most of these costs will originate from the Databricks SKU serverless model serving and may grow as you scale up.

Check out these blogs to understand how to do this on Databricks.

In addition, Databricks Delivery Solutions Architects (DSAs) help accelerate Data and AI initiatives across organizations. DSAs provide architectural leadership, optimize platforms for cost and performance, enhance developer experience, and drive successful project execution. They bridge the gap between initial deployment and production-grade solutions, working closely with various teams, including data engineering, technical leads, executives, and other stakeholders to ensure tailored solutions and faster time to value. Contact your Databricks Account Team to learn more.

Get started by building your own GenAI App! Check out the documentation to get started.

At Databricks, you have all the tools you need to develop this end-to-end solution. Check out the blogs below to learn about managing and working with your new Agent with the Mosaic AI Agent Framework.

What's next?

November 14, 2024/2 min read

Providence Health: Scaling ML/AI Projects with Databricks Mosaic AI

November 26, 2024/6 min read