Deploy Your LLM Chatbot With Retrieval Augmented Generation (RAG), llama2-70B (MosaicML inferences) and Vector Search
What you’ll learn
LLMs are disrupting the way we interact with information, from internal knowledge bases to external, customer-facing documentation or support.
In this tutorial, we will cover how Databricks is uniquely positioned to help you build your own chatbot using Retrieval Augmented Generation (RAG) and deploy a real-time Q&A bot using Databricks serverless capabilities. We will leverage llama2-70B-Chat to answer our questions, using MosaicML Inference API.
RAG is a powerful technique where we enrich the LLM prompt with additional context specific to your domain so that the model can provide better answers.
This technique provides excellent results using public models without having to deploy and fine-tune your own LLMs.
You will learn how to:
- Prepare clean documents to build your internal knowledge base and specialize your chatbot
- Leverage Databricks Vector Search with AI Gateway to create and store document embeddings
- Search similar documents from our knowledge database with Vector Search
- Deploy a real-time model using RAG and providing augmented context in the prompt
- Leverage the llama2-70B-Chat model through an AI Gateway using MosaicML endpoint (fully managed)
To run the demo, get a free Databricks workspace and execute the following two commands in a Python notebook:
%pip install dbdemos
import dbdemos dbdemos.install('llm-rag-chatbot')
Disclaimer: This tutorial leverages features that are currently in private preview. Databricks Private Preview terms apply.
For more details, open the introduction notebook.