Data Governance

Try Databricks for free

What is Data Governance?

Data governance is the oversight to ensure data brings value and supports the business strategy. Data governance is more than just a tool or a process. It aligns data-related requirements to the business strategy using a framework across people, processes, technology, and data focusing on culture to support the business goals and objectives.

What are the business benefits of Data Governance?

As the amount and complexity of data are growing, more and more organizations are looking at data governance to ensure the core business outcomes:

  • Consistent and high data quality as a foundation for analytics and machine learning
  • Reduced time to insight
  • Support for risk and compliance for industry regulations such as HIPPA, FedRAMP, GDPR, or CCPA.
  • Data democratization, i.e. enabling everybody in an organization to make data-driven decisions
  • Cost optimization, e.g. by preventing users to start up large clusters and creating guardrails for using expensive GPU instances.

What does a good data governance solution look like?

Data-driven companies typically build their data architectures for analytics on the lakehouse. A data lakehouse is an architecture that enables efficient and secure data engineering, machine learning, data warehousing, and business intelligence directly on vast amounts of data stored in data lakes. Data governance for a data lakehouse provides a number of key capabilities :

  • Unified Catalog: A unified catalog stores all your data, ML models, and analytics artifacts, as well as metadata for each data object. The unified catalog also blends in data from other catalogs such as an existing Hive metastore.
  • Unified data access controls: A single and unified permissions model across all data assets and all clouds. This includes attribute bases access control (ABAC) for personally identifiable information (PII).
  • Data Auditing: Data access is centrally audited with alerts and monitoring capabilities to promote accountability.
  • Data Quality Management: Robust data quality management with built-in quality controls, testing, monitoring, and enforcement to ensure accurate and useful data is available for downstream BI, analytics, and machine learning workloads
  • Data Lineage: Data Lineage to get end-to-end visibility into how data flows in lakehouse  from source to consumption
  • Data Discovery: Easy data discovery to enable data scientists, data analysts, and data engineers to quickly discover and reference relevant data and accelerate time to value
  • Data Sharing: Data can be shared across clouds and platforms.

What is the difference between Data Management and Data Governance?

Data management focuses on activities in compliance with data governance policies, principles, and standards to deliver trusted data. Such activities are usually project-focused and short. Data governance is treated as a program to realize longer-term benefits. A centralized governance tool plays a key role in the implementation of governance.

Learn more about data governance and data sharing on Databricks

    Back to Glossary