Skip to main content

What is Snowflake Schema?

Warehouse design extending star schema by normalizing dimension tables into multiple related tables, reducing redundancy while adding query complexity

4 Personas Agnostic 1a

Summary

  • Normalizes dimension tables into hierarchical structures (such as product to category to department) creating multiple related tables connected through foreign keys, contrasting with star schema's denormalized dimensions
  • Reduces storage space and update anomalies through normalization eliminating data redundancy, beneficial for large dimension tables with many hierarchical levels and attributes requiring frequent updates
  • Increases query complexity requiring more joins to reconstruct full dimensional context, potentially impacting query performance despite storage savings, making it less favorable for read-heavy analytical workloads compared to star schema

What is a snowflake schema?

A snowflake schema is a multi-dimensional data model that is an extension of a star schema, where dimension tables are broken down into subdimensions. Snowflake schemas are commonly used for business intelligence and reporting in OLAP data warehouses, data marts, and relational databases.

In a snowflake schema, engineers break down individual dimension tables into logical subdimensions. This makes the data model more complex, but it can be easier for analysts to work with, especially for certain data types.

It's called a snowflake schema because its entity-relationship diagram (ERD) looks like a snowflake, as seen below.

Snowflake schemas vs. star schemas

Like star schemas, snowflake schemas have a central fact table which is connected to multiple dimension tables via foreign keys. However, the main difference is that they are more normalized than star schemas.

Snowflake schemas offer more storage efficiency, due to their tighter adherence to high normalization standards, but query performance is not as good as with more denormalized data models. Denormalized data models like star schemas have more data redundancy (duplication of data), which makes query performance faster at the cost of duplicated data.

A 5X LEADER

Gartner®: Databricks Cloud Database Leader

Benefits of snowflake schemas

  • Fast data retrieval
  • Enforces data quality
  • Simple, common data model for data warehousing

Drawbacks of snowflake schemas

  • Lots of overhead upon initial setup
  • Rigid data model
  • High maintenance costs

Resources

Never miss a Databricks post

Subscribe to our blog and get the latest posts delivered to your inbox