We're excited to announce the General Availability of Databricks Predictive Optimization. This capability intelligently optimizes your table data layouts for faster queries and reduced storage costs.
Predictive Optimization harnesses Unity Catalog and is powered by the Data Intelligence Engine to determine the best optimizations to perform on your data and run those operations automatically on serverless infrastructure.
Where previously data teams needed to manually manage maintenance operations, the Databricks Data Intelligence Platform does that for you, reducing management complexity and improving performance and cost-efficiency out of the box.
Get started today by enabling Predictive Optimization from your account console.
Proper table maintenance significantly improves query performance and cost efficiency by optimizing the data lake for your organization's unique needs. However, getting this right requires technical expertise, manual overhead, and continuous adjustments as your organization's data and use cases evolve.
Data engineering teams need to figure out:
Once these questions are answered, teams must then manage the operational overhead of running these optimizations - e.g., scheduling jobs, diagnosing failures, and managing the underlying infrastructure.
Furthermore, this is not a one-time setup – teams must continuously update these jobs when data grows, new tables are added, and access patterns change. As data and AI use cases have exploded within organizations, many customers have shared that they are unable to keep up with optimizing tables created by expanding business needs.
With Predictive Optimization, Databricks takes care of all of this for you with AI and Unity Catalog, enabling you to focus on driving business value.
Predictive Optimization intelligently determines the best schedule of optimizations by leveraging Unity Catalog and the Data Intelligence Engine. Our AI model takes your organization's query patterns, and combines them with factors such as data layout, table properties, and performance characteristics, to determine the most impactful optimizations to run.
For many customers, the impact and ROI is immediate. For example, the team at Plenitude, a large energy company, saw significant benefits soon after enabling Predictive Optimization.
"Databricks Predictive Optimization consistently helps the FinOps group minimize storage costs. We've immediately seen a 26% drop in storage costs, and we expect additional incremental savings going forward. The capability has enabled us to retire procedures, scripts, and manual maintenance operations, allowing us to achieve greater out-of-the-box scalability." — Alessandro Caronia, Infrastructure Operations Manager and Simona Fiazza, End to End Operations Manager at Plenitude
Predictive Optimization also automatically learns and adjusts to your data usage patterns. The intelligence engine learns from your organization's usage over time. It ensures that your data is always stored in the most efficient layout, translating to cost savings and performance gains without the need for continuous manual intervention.
This self-driving system fully replaces manual solutions, like the one at Toloka AI, an AI data annotation platform.
"Thanks to Predictive Optimization (PO), we were able to decommission our DIY solution for table maintenance. PO is more efficient and cost-effective, as it optimizes only the tables that benefit from maintenance operations. PO simplifies our data platform, allowing for better allocation of resources and a more streamlined data management process." — Nikita Bochkarev, Senior Data Engineer at Toloka AI
New since Preview, Predictive Optimization will now automatically run OPTIMIZE on tables with Liquid Clustering, in addition to vacuum and compaction. You no longer have to schedule or determine the frequency of clustering – Predictive Optimization will cluster at an optimal cadence for better query performance.
Since launching as a Preview, Predictive Optimization has intelligently run optimizations over hundreds of thousands of tables comprising exabytes of data. These optimizations improve query performance by optimizing file size and layout on disk and have generated millions in annual storage savings for customers.
Preview customers like Anker have reported 2x improvements in query performance and 50% storage savings.
"Databricks' Predictive Optimizations intelligently optimized our Unity Catalog storage, which saved us 50% in annual storage costs while speeding up our queries by >2x. It learned to prioritize our largest and most-accessed tables. And, it did all of this automatically, saving our team valuable time."
