A data vault is a data modeling design pattern used to build a data warehouse for enterprise-scale analytics. The data vault has three types of entities: hubs, links, and satellites.
Hubs represent core business concepts, links represent relationships between hubs, and satellites store information about hubs and relationships between them.
The data vault is a data model that is well-suited to organizations that are adopting the lakehouse paradigm.
A few additional things to keep in mind:
Data vaults are based on agile methodologies and techniques, which means that they can adapt to fast-paced changing business requirements. One of the major advantages of using the Data Vault methodology is that ETL jobs need less refactoring when the model changes.
With these concepts in mind, let's explore how Data Vault fits into our Bronze, Silver and Gold data layers where data goes from a raw to a refined state that is ready for analytics. In this multi-hop architecture, raw data gets stored in a Bronze layer with minimum transformation and data structure as close to the source system. The Data Vault methodology can be applied to the Silver layer where data is transformed into Hubs, links and satellites.
In the Gold layer, multiple data marts/data warehouses can be built as per dimensional modeling/Kimball methodology. As discussed earlier, the Gold layer is for reporting and uses more de-normalized and read-optimized data models with fewer joins. Sometimes tables in the Gold Layer can be completely denormalized, typically if the Data Scientists want it that way to feed their algorithms for feature engineering.
If a Data Vault model is used in the Silver layer it simplifies and significantly reduces the changes needed to perform ETL into the Data Marts and Data Warehouses, as Hubs make key management (surrogate key / natural keys) easier. Satellites make loading dimensions easier because they have all the attributes, and links make loading fact tables quite straightforward because they have all the relationships.