How we got here

Your data warehouse
wasn’t built for today’s world

Like the CD, disposable camera, floppy disk and most other 40-year-old innovations, the data warehouse had a great run. But new use cases have spawned new technologies. CDs can’t stream music. Film cameras can’t share photos. Floppy disks can’t compete with infinite cloud storage. And data warehouses can’t perform AI.

It’s time for a simpler approach

AI is a priority for every organization. But today’s complex and outdated legacy infrastructure can’t deliver on the promise of AI. It’s time for a new data architecture built to meet your needs today — and future-proofed so it’s ready for whatever tomorrow brings.

 Discover
 Lakehouse
Dawn of the Lakehouse

A new era of data and AI opens

The data lakehouse is an open data architecture that combines the best of data warehouses and data lakes on one platform.

Now you can store all your data — structured, semi-structured and unstructured — in your open data lake and still get the data quality, performance, security and governance you expect from a data warehouse. This makes lakehouse the only data architecture that supports business intelligence, SQL analytics, real-time data applications, data science and machine learning in one platform.

Anatomy of a Lakehouse

One platform for all use cases

Delta Lake

The essential ingredients

Delta Lake is an open-source project that delivers reliability, security and performance on your data lake — essential to building lakehouse architecture on top of existing storage systems such as Amazon S3, Azure Data Lake Store and Google Cloud Storage.

Delta Lake is stored in an open data format so you avoid data lock-in from proprietary formats and gain access to a vast open source ecosystem. Today, thousands of companies are processing exabytes of data per month with Delta Lake.


More about Delta Lake →

hex-bg

How lakehouse compares

Lakehouses do what warehouses can’t

Lakehouse leapfrogs the limitations of the data warehouse because it’s designed to manage all types of data while supporting both traditional data warehouse workloads and machine learning natively. It adds all this functionality to your existing data lake, creating a single open system to both manage all of your data and support every use case.

Data Warehouse Lakehouse
Data formats
Data formats Closed Open
Data types
Data types Structured* Any type of data
Scalability
Scalability Limited** Highly scalable
Cost
Cost $$$ $
Use cases
Use cases BI, SQL BI, SQL, ML, Real-Time Apps
Data access
Data access SQL only Open APIs for direct access to files with SQL, R, Python and other languages
Reliability
Reliability High-quality, reliable data with ACID transactions High-quality, reliable data with ACID transactions
Governance
Governance Fine-grained security and governance for row/columnar level for tables Fine-grained security and governance for row/columnar level for tables
Performance
Performance High High

*Limited support for semi-structured data
**Cost of scaling is prohibitive

The father of data warehousing agrees.

Grab your free copy of Bill Inmon’s new book, Building the Data Lakehouse.

Book Cover

Lakehouse transforms your data lake

Lakehouses overcome the fundamental issues that have turned data lakes into data swamps. They bring quality to your data lake by adding key data warehousing capabilities such as transactions, schemas and governance. They also leverage various performance optimization techniques to enable fast analytics. With these data management and performance optimizations to the open data lake, lakehouses can natively support BI and ML applications.

Data Lake Lakehouse
Data formats
Data formats Open Open
Data types
Data types Any type of data Any type of data
Scalability
Scalability Highly scalable Highly scalable
Cost
Cost $ $
Use cases
Use cases ML BI, SQL, ML, Real-Time Apps
Data access
Data access Highly scalable Open APIs for direct access to files with SQL, R, Python and other languages
Reliability
Reliability Low quality, data swamp High-quality, reliable data with ACID transactions
Governance
Governance Poor governance because security needs to be applied to files Fine-grained security and governance for row/columnar level for tables
Performance
Performance Low High

The father of data warehousing agrees.

Grab your free copy of Bill Inmon’s new book, Building the Data Lakehouse.

Book Cover

The databricks lakehouse

The world’s first and only lakehouse platform in the cloud

Delivered and managed as a service on AWS, Microsoft Azure, or Google Cloud, the Databricks Lakehouse Platform makes all the data in your data lake available for any number of data-driven use cases.

Data engineers can build fast and reliable data pipelines. Business analysts can perform BI, running SQL queries faster than most data warehouses. Data scientists can streamline MLOps. And when all your data teams are on a common platform, you can significantly reduce infrastructure costs, increase data team productivity and accelerate innovation.

BI & SQL
Analytics directly on your data lake

Databricks brings data analytics to your data lake, delivering data warehouse performance at data lake economics.
Using open source standards to avoid data lock-in, the Databricks Lakehouse Platform provides the reliability, quality and performance capabilities that data lakes natively lack and up to 6x better price/performance than traditional cloud data warehouses.

Data
Engineering
Fresh and reliable data with ease

Databricks provides an end-to-end data engineering solution — ingestion, processing and scheduling — that automates the complexity of building and maintaining pipelines and running ETL workloads directly on a data lake so data engineers can focus on quality and reliability to drive valuable insights.

Stream Processing
Easy, scalable and fault-tolerant stream processing

With Databricks, data teams can extract actionable insights from unbounded data with uninterrupted processing to deliver service guarantees at a fraction of the cost. Using Databricks for streaming use cases provides data teams the ability to create low-latency, scalable and fault-tolerant real-time data-driven applications.

Data Science and ML
Full machine learning lifecycle

Databricks provides a complete, open platform for data science and machine learning. By enabling access to high-quality, highly performant data pipelines and advanced machine learning capabilities out of the box, Databricks empowers data and ML teams to collaborate on a unified platform, accelerating the full machine learning lifecycle from feature engineering to production.

Common Security and Administration
Open Data Lake Logos

Common Security and Administration

Databricks protects your data with fine-grained access controls and the ability to easily extend security with existing cloud-native security policies and identity management systems to create private, compliant and isolated workspaces. Platform administrators can easily manage the end-to-end platform experience and control spend across every workspace.

Data Processing, Management and Governance
Open Data Lake Logos

Data Processing, Management and Governance

With automated and reliable ETL, open and secure data sharing, and a unified approach to governance that spans cloud providers, Databricks streamlines data management and forms the foundation of a cost-effective, highly scalable lakehouse.

Open Data Lake
Open Data Lake Logos

High-quality, reliable data

Your data lake already contains the vast majority of your structured, semi-structured and unstructured data. Now combine the openness and flexibility of your data lake with strong reliability and quality to support the demands of all analytics use cases at scale.

Discover

Hover to explore the layers of a lakehouse built on Databricks.

Analytics directly on your data lake

Databricks brings data analytics to your data lake, delivering data warehouse performance at data lake economics. Using open source standards to avoid data lock-in, the Databricks Lakehouse Platform provides the reliability, quality and
performance capabilities that data lakes natively lack and up to 6x better price/performance than traditional cloud data warehouses.

Analytics directly on your data lake

Databricks brings data analytics to your data lake, delivering data warehouse performance at data lake economics. Using open source standards to avoid data lock-in, the Databricks Lakehouse Platform provides the reliability, quality and performance capabilities that data lakes natively lack and up to 6x better price/performance than traditional cloud data warehouses.

Fresh and reliable data with ease

Databricks provides an end-to-end data engineering solution — ingestion, processing and scheduling — that automates the complexity of building and maintaining pipelines and running ETL workloads directly on a data lake so data engineers can focus on quality and reliability to drive valuable insights.

With Databricks, data teams can extract actionable insights from unbounded data with uninterrupted processing to deliver service guarantees at a fraction of the cost. Using Databricks for streaming use cases provides data teams the ability to create low-latency, scalable and fault-tolerant real-time data-driven applications.

Full machine learning lifecycle

Databricks provides a complete, open platform for data science and machine learning. By enabling access to high-quality, highly performant data pipelines and advanced machine learning capabilities out of the box, Databricks empowers data and ML teams to collaborate on a unified platform, accelerating the full machine learning lifecycle from feature engineering to production.

Delta Lake Databricks protects your data with fine-grained access controls and the ability to easily extend security with existing cloud-native security policies and identity management systems to create private, compliant and isolated workspaces. Platform administrators can easily manage the end-to-end platform experience and control spend across every workspace.

Delta LakeWith automated and reliable ETL, open and secure data sharing, and a unified approach to governance that spans cloud providers, Databricks streamlines data management and forms the foundation of a cost-effective, highly scalable lakehouse.

High-quality, reliable dataYour data lake already contains the vast majority of your structured, semi-structured and unstructured data. Now combine the openness and flexibility of your data lake with strong reliability and quality to support the demands of all analytics use cases at scale.

real world success

The world’s leading companies are moving to the lakehouse