What is a data mart?
A data mart is a curated database including a set of tables that are designed to serve the specific needs of a single data team, community, or line of business, like the marketing or engineering department. It is normally smaller and more focused than a data warehouse, and generally exists as a subset of an organization's larger enterprise data warehouse. Data marts are commonly used for analytics, business intelligence, and reporting. Data marts were the first evolutionary step in the physical reality of central data warehouses and data lakes. ACNielsen offered their clients the first data mart in the early 1970s to provide a way for them to store information digitally and boost their sales efforts.
Characteristics of data marts
- Typically built and managed by the enterprise data team, although they can be built and maintained by business unit SMEs organically as well.
- Business group data stewards maintain the data mart, and end users have read-only access — they can query and view tables, but cannot modify them, in order to prevent less technically-savvy users from accidentally deleting or modifying critical business data.
- Typically uses a dimensional model and star schema.
- Contains a curated subset of data from the larger data warehouse. The data is highly structured, having been cleansed and conformed by the enterprise data team to make it easy to understand and query.
- Designed around the unique needs of a particular line of business or use case.
- Users typically query the data using SQL commands.
Types of data marts: independent data marts, dependent data marts, and hybrid data marts
Today, there are three basic types of data marts:
- Independent data marts are not part of a data warehouse, and are very similar to the original data mart offered by ACNielsen. They are typically focused on one area of business or subject area. Data sources can include both external and internal sources. It is then translated, processed, and loaded into the data mart, where it is stored until needed.
- Dependent data marts are built into an existing data warehouse. A top-down approach is used, supporting the storage of all data in a centralized location. A clearly defined section of data is then selected for purposes of research.
- Hybrid data marts combine the data taken from a data warehouse and "other" data sources. This can be useful in a variety of situations, including providing the ad hoc integration with a new group, or product, which has been added to an organization. Hybrid data marts are well-suited for multiple database environments and provide fast implementation turnaround. These systems make data cleansing easy, and work well with smaller data-centric applications.
Benefits of data marts
- Single source of truth — the data mart can serve as a single source of truth for a particular line of business, so everyone is working off of the same facts and data.
- Simplicity — business users looking for data can visit the curated data mart for easy access to the data they care about, instead of having to wade through the entire data warehouse and join tables together to get the data they need.
Challenges with data marts
Enterprise data warehouses are created with good intentions to serve all of an enterprise's data management needs. But invariably, you can't keep everyone happy, as different business units have different data needs and objectives. So departments copy and create their own data marts (sometimes with Enterprise IT help) with the aim of augmenting a particular data warehouse's subject area, to meet their self-service analytics and departmental reporting needs. As a result, over time, data marts can become data silos and shadow copies of data — from an enterprise perspective — but they do serve the department's needs well. When many departments do this - there is no single version of truth.
How Lakehouse solves the challenges with data marts
Lakehouse solves the challenges mentioned above by putting all of the enterprise data warehouses and data marts on one platform, with unified security and governance — while still offering different teams the flexibility to have their own sandboxes. Since any data mart or "augmented copy" is made on the same Lakehouse platform as all the others — the Lakehouse's data catalog discovers that, and given the Data Governance rules like tagging and using a data dictionary etc., it ensures that the augmented copy is made discoverable by all — preventing similar duplicate copies.
Build your next data mart on Databricks SQL
Try Databricks SQL for free