Skip to main content
Engineering blog

Llama 2 Foundation Models Available in Databricks Lakehouse AI

Share this post

We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and deploy on private model serving endpoints. The Databricks Marketplace is an open marketplace that enables you to share and exchange data assets, including datasets and notebooks, across clouds, regions, and platforms. Adding to the data assets already offered on Marketplace, this new listing provides instant access to Llama 2's chat-oriented large language models (LLM), from 7 to 70 billion parameters, as well as centralized governance and lineage tracking in the Unity Catalog. Each model is wrapped in MLflow to make it easy for you to use the MLflow Evaluation API in Databricks notebooks as well as to deploy with a single-click on our LLM-optimized GPU model serving endpoints. 

Databricks Marketplace
Databricks Marketplace

What is Llama 2?

Llama 2 is Meta AI’s family of generative text models that are optimized for chat use cases. The models have outperformed other open models and represent a breakthrough where fine-tuned open models can compete with OpenAI’s GPT-3.5-turbo. 

Llama 2 on Lakehouse AI

You can now get a secure, end-to-end experience with Llama 2 on the Databricks Lakehouse AI platform:

  1. Access on Databricks Marketplace. You can preview the notebook, and get instant access to the chat-family of Llama 2 models from the Databricks Marketplace. The Marketplace makes it easy to discover and evaluate state-of-the-art foundation models you can manage in the Unity Catalog. 
  2. Centralized Governance in the Unity Catalog. Because the models are now in a catalog, you automatically get all the centralized governance, auditing, and lineage tracking that comes with the Unity Catalog for your Llama 2 models.
  3. Deploy in One-Click with Optimized GPU Model Serving. We packaged the Llama 2 models in MLflow so you can have single-click deployment to privately host your model with Databricks Model Serving. The now Public Preview of GPU Model Serving is optimized to work with large language models to provide low latencies and support high throughput. This is a great option for use cases that leverage sensitive data or in cases where you cannot send customer data to third-parties. 
  4. Saturate Private Endpoints with AI Gateway. Privately hosting models on GPUs can be expensive, which is why we recommend leveraging the MLflow AI Gateway to create and distribute routes for each use case in your organization to saturate the endpoint. The AI Gateway now supports rate limiting for cost control in addition to secure credential management of Databricks Model Serving endpoints and externally-hosted SaaS LLMs. 
  5. Try it yourself: Launch the product tour to see how to serve Llama 2 models from Databricks Marketplace
pick the model
Select the Llama 2 Model from Marketplace

Get Started with Generative AI on Databricks

The Databricks Lakehouse AI platform enables developers to rapidly build and deploy generative AI applications with confidence.

Stay tuned for more exciting announcements soon!