Databricks Introduces New Generative AI Tools, Investing in Lakehouse AI
June 28, 2023
Databricks’ data-centric approach to AI makes it easier to build, deploy and manage large language model (LLM) applications, enabling customers to accelerate their generative AI journey
San Francisco, CA — June 28, 2023 — At the sold-out Data + AI Summit, Databricks, the Data and AI company, today announced new Lakehouse AI innovations that allow customers to easily and efficiently develop generative AI applications, including large language models (LLMs), directly within the Databricks Lakehouse Platform. Lakehouse AI offers a unique, data-centric approach to AI, with built-in capabilities for the entire AI lifecycle and underlying monitoring and governance. New features that will help customers more easily implement generative AI use cases include: Vector Search, a curated collection of open source models, LLM-optimized Model Serving, MLflow 2.5 with LLM capabilities such as AI Gateway and Prompt Tools, and Lakehouse Monitoring.
The demand for generative AI is driving disruption across industries, creating urgency for technical teams to build generative AI models and LLMs on top of their own data to differentiate their offerings. However, data determines success with AI, and when the data platform is separate from the AI platform, it’s difficult to enforce and maintain clean, high-quality data. Additionally, the process of getting a model from experimentation to production, and the related tuning, operationalizing, and monitoring of the models, is complex and unreliable.
With Lakehouse AI, Databricks unifies the data and AI platform, so customers can develop their generative AI solutions faster and more successfully – from using foundational SaaS models to training their own custom models securely with their enterprise data. By bringing together data, AI models, LLM operations (LLMOps), monitoring and governance on the Databricks Lakehouse Platform, organizations can accelerate their generative AI journey.
“At JetBlue, we inspire humanity through our product, culture and customer service. We’ve embarked on an AI transformation over the past year because we believe AI, and in particular LLMs, can fuel increased productivity and better customer experience for our travelers,” said Sai Ravuru, Senior Manager of Data Science and Analytics at JetBlue. “Databricks has been instrumental in our AI and ML transformation and has helped us build our own LLM, enabling our team to more effectively use the BlueSky platform to make decisions using real-time streams of weather, aircraft sensors, FAA data feeds and more. The deployment is significantly improving our onboarding time for new users. We’re excited about all of Databricks’ data-centric AI innovations, enabling customers like us to build LLMs in the lakehouse and govern them from there.”
Offering the Best Data Platform to Develop Generative AI Solutions
Lakehouse AI unifies the AI lifecycle, from data collection and preparation, to model development and LLMOps, to serving and monitoring. Newly announced capabilities include:
- Vector Search: Databricks Vector Search enables developers to improve the accuracy of their generative AI responses through embeddings search. It will fully manage and automatically create vector embeddings from files in Unity Catalog — Databricks’ flagship solution for unified search and governance across data, analytics and AI — and keep them updated automatically through seamless integrations Databricks Model Serving. Additionally, developers have the ability to add query filters to provide even better outcomes for their users.
- Fine-tuning in AutoML: Databricks AutoML now brings a low-code approach to fine-tuning LLMs. Customers can securely fine-tune LLMs using their own enterprise data and they will own the resulting model that’s produced by AutoML, without having to send data to a third party. Additionally, with MLflow, Unity Catalog and Model Serving integrations, the model can be easily shared within an organization, governed for appropriate use, served for inference in production and monitored.
- Curated open source models, backed by optimized Model Serving for high performance: Databricks has published a curated list of open source models available within Databricks Marketplace — including MPT-7B and Falcon-7B instruction-following and summarization models, and Stable Diffusion for image generation — making it easy to get started with generative AI across a variety of use cases. Lakehouse AI capabilities like Databricks Model Serving have been optimized for these models to ensure peak performance and cost optimization.
Managing LLMOps Effectively and Reliably
Databricks also unveiled new innovations in LLMOps with the announcement of MLflow 2.5, the latest release of popular Linux Foundation open source project MLflow. This is Databricks’ latest contribution to one of the company’s flagship open source projects. MLflow is an open source platform for the machine learning lifecycle that sees nearly 11 million monthly downloads. MLflow 2.5 updates include:
- MLflow AI Gateway: MLflow AI Gateway enables organizations to centrally manage credentials for SaaS models or model APIs and provide access-controlled routes for querying. Organizations can then provide these routes to various teams to integrate into their workflows or projects. Developers can easily swap out the backend model at any time to improve cost and quality, and switch across LLM providers. MLflow AI Gateway will also enable prediction caching to track repeated prompts and rate limiting to manage costs.
- MLflow Prompt Tools: New, no-code visual tools allow users to compare various models’ output based on a set of prompts, which are automatically tracked within MLflow. With integration into Databricks Model Serving, customers can deploy the relevant model to production.
Additionally, following its release earlier this year, Databricks Model Serving has been optimized for the inference of LLMs up to 10x lower latency time and reduced costs. Fully managed by Databricks to offer frictionless infrastructure management, Model Serving now enables GPU-based inference support. It auto-logs and monitors all requests and responses to Delta Tables and ensures end-to-end lineage tracking through Unity Catalog. Finally, Model Serving quickly scales up from zero and back down as demand changes, reducing operational costs and ensuring customers pay only for the compute they use.
Intelligent Monitoring Across Data and AI Assets
Databricks also expanded its data and AI monitoring capabilities with the introduction of Databricks Lakehouse Monitoring to better monitor and manage all data and AI assets within the Lakehouse. Databricks Lakehouse Monitoring provides end-to-end visibility into data pipelines, to continuously monitor, tune and improve performance, without additional tools and complexity. By taking advantage of Unity Catalog, Lakehouse Monitoring provides users with deep insight into the lineage of their data and AI assets to ensure high quality, accuracy and reliability. Proactive detection and reporting will make it easy to spot and diagnose errors in pipelines, automatically perform root cause analysis and quickly find recommended solutions across the data lifecycle.
“We’ve reached an inflection point for organizations: leveraging AI is no longer aspirational — it is imperative for organizations to remain competitive,” said Ali Ghodsi, Co-Founder and CEO at Databricks. “Databricks has been on a mission to democratize data and AI for more than a decade and we’re continuing to innovate as we make the lakehouse the best place for building, owning and securing generative AI models.”
Databricks continues to expand the Lakehouse Platform, recently announcing Lakehouse Apps and the general availability of Databricks Marketplace, LakehouseIQ, new governance capabilities, and Delta Lake 3.0.
MLflow 2.5 features will be available in the July release of MLflow. New Databricks capabilities including Vector Search and Lakehouse Monitoring are currently in preview.
To learn more about Lakehouse AI watch the Data + AI Summit live: https://www.databricks.com/dataaisummit/watch
Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Delta Lake, Apache Spark™, and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn, and Facebook.
Contact: [email protected]
Safe Harbor Statement
This information is provided to outline Databricks’ general product direction and is for informational purposes only. Customers who purchase Databricks services should make their purchase decisions relying solely upon services, features, and functions that are currently available. Unreleased features or functionality described in forward-looking statements are subject to change at Databricks discretion and may not be delivered as planned or at all.