What is Medallion Architecture?

Lakehouse design pattern organizing data into bronze (raw), silver (cleaned), and gold (aggregated) layers for progressive data quality refinement

by Databricks Staff

Medallion architecture is a data design pattern that organizes lakehouse data into Bronze, Silver and Gold layers to progressively improve data quality and structure.
Bronze tables capture raw data, Silver tables clean and standardize it and Gold tables contain business level aggregates and features ready for analytics and machine learning.
This layered approach promotes reuse, governance and performance while giving different teams clear entry points into shared data on the Databricks Lakehouse.

What is a medallion architecture?

A medallion architecture is a data design pattern used to logically organize data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data as it flows through each layer of the architecture (from Bronze ⇒ Silver ⇒ Gold layer tables). Medallion architectures are sometimes also referred to as "multi-hop" architectures.

Building Reliable, Performant Data Pipelines with Delta Lake

Building data pipelines with medallion architecture

Databricks provides tools like Spark Declarative Pipelines that allow users to instantly build data pipelines with Bronze, Silver and Gold tables from just a few lines of code. And, with streaming tables and materialized views, users can create streaming Lakeflow pipelines built on Apache Spark™️ Structured Streaming that are incrementally refreshed and updated. For more details, see Databricks documentation on combining streaming tables and materialized views in a single pipeline.

Bronze layer (raw data)

The Bronze layer is where we land all the data from external source systems. The table structures in this layer correspond to the source system table structures "as-is," along with any additional metadata columns that capture the load date/time, process ID, etc. The focus in this layer is quick Change Data Capture and the ability to provide an historical archive of source (cold storage), data lineage, auditability, reprocessing if needed without rereading the data from the source system.

Silver layer (cleansed and conformed data)

In the Silver layer of the lakehouse, the data from the Bronze layer is matched, merged, conformed and cleansed ("just-enough") so that the Silver layer can provide an "Enterprise view" of all its key business entities, concepts and transactions. (e.g. master customers, stores, non-duplicated transactions and cross-reference tables).

The Silver layer brings the data from different sources into an Enterprise view and enables self-service analytics for ad-hoc reporting, advanced analytics and ML. It serves as a source for Departmental Analysts, Data Engineers and Data Scientists to further create projects and analysis to answer business problems via enterprise and departmental data projects in the Gold Layer.

In the lakehouse data engineering paradigm, typically the ELT methodology is followed vs. ETL - which means only minimal or "just-enough" transformations and data cleansing rules are applied while loading the Silver layer. Speed and agility to ingest and deliver the data in the data lake is prioritized, and a lot of project-specific complex transformations and business rules are applied while loading the data from the Silver to Gold layer. From a data modeling perspective, the Silver Layer has more 3rd-Normal Form like data models. Data Vault-like, write-performant data models can be used in this layer.

Gold layer (curated business-level tables)

Data in the Gold layer of the lakehouse is typically organized in consumption-ready "project-specific" databases. The Gold layer is for reporting and uses more de-normalized and read-optimized data models with fewer joins. The final layer of data transformations and data quality rules are applied here. Final presentation layer of projects such as Customer Analytics, Product Quality Analytics, Inventory Analytics, Customer Segmentation, Product Recommendations, Marking/Sales Analytics etc. fit in this layer. We see a lot of Kimball style star schema-based data models or Inmon style Data marts fit in this Gold Layer of the lakehouse.

So you can see that the data is curated as it moves through the different layers of a lakehouse. In some cases, we also see that lot of Data Marts and EDWs from the traditional RDBMS technology stack are ingested into the lakehouse, so that for the first time Enterprises can do "pan-EDW" advanced analytics and ML - which was just not possible or too cost prohibitive to do on a traditional stack. (e.g. IoT/Manufacturing data is tied with Sales and Marketing data for defect analysis or health care genomics, EMR/HL7 clinical data markets are tied with financial claims data to create a Healthcare Data Lake for timely and improved patient care analytics.)

Benefits of a lakehouse architecture

Simple data model
Easy to understand and implement
Enables incremental ETL
Can recreate your tables from raw data at any time
ACID transactions, time travel

A quick primer on lakehouses

A lakehouse is a data platform architecture paradigm that combines the best features of data lakes and data warehouses. A modern lakehouse is a highly scalable and performant data platform hosting both raw and prepared data sets for quick business consumption and to drive advanced business insights and decisions. It breaks data silos and allows seamless, secure data access to authorized users across the enterprise on one platform.

Databricks Lakehouse Platform Architecture

Medallion architecture and data mesh

The Medallion architecture is compatible with the concept of a data mesh. Bronze and silver tables can be joined together in a "one-to-many" fashion, meaning that the data in a single upstream table could be used to generate multiple downstream tables.

What is Medallion Architecture?

What is a medallion architecture?

Building data pipelines with medallion architecture

Bronze layer (raw data)

Silver layer (cleansed and conformed data)

Gold layer (curated business-level tables)

Benefits of a lakehouse architecture

A quick primer on lakehouses

Medallion architecture and data mesh

Additional Resources

Get the latest posts in your inbox

Sign up

What is a medallion architecture?

Building data pipelines with medallion architecture

Bronze layer (raw data)

Silver layer (cleansed and conformed data)

The agentic AI playbook for the enterprise

Gold layer (curated business-level tables)

Benefits of a lakehouse architecture

A quick primer on lakehouses

Medallion architecture and data mesh

Additional Resources

Get the latest posts in your inbox

Sign up