What is a materialized view?
A materialized view is a database object that stores the results of a query as a physical table. Unlike regular database views, which are virtual and derive their data from the underlying tables, materialized views contain precomputed data that is incrementally updated on a schedule or on demand. This precomputation of data allows for faster query response times and improved performance in certain scenarios.
Materialized views are especially useful in situations where complex queries or aggregations are performed frequently, and the underlying data changes infrequently. By storing the pre-computed results, the database can avoid the need to execute complex queries repeatedly, resulting in faster response times.
Materialized views in Databricks SQL
Materialized views in Databricks SQL are managed via Unity Catalog. They store pre-computed results based on the latest data from source tables. Unlike traditional implementations, Databricks materialized views retain the data state at the time of the last refresh, rather than updating every time they are queried. You have the flexibility to manually refresh the materialized views or set up scheduled automatic refreshes.
Databricks SQL materialized views are particularly useful for ETL (extract, transform, load) processing. They provide a simple and declarative approach to handle compliance, corrections, aggregations, and change data capture (CDC). Materialized views significantly improve query latency and reduce costs by precomputing slow queries and frequently used computations. Additionally, they enable seamless transformations by cleaning, enriching, and denormalizing base tables. In certain cases, materialized views can incrementally compute changes from the base tables, resulting in reduced costs and a streamlined user experience.
Databricks first introduced materialized views as part of the Lakehouse architecture, with the launch of Delta Live Tables. Creating a materialized view in a DB SQL warehouse automatically creates a Delta Live Tables pipeline to manage view refreshes. You can easily monitor the status of refresh operations using the Delta Live Tables UI, API, or CLI.
Creating a materialized view
The following example creates the materialized view customer_orders from base tables orders and customers:
CREATE MATERIALIZED VIEW customer_orders AS SELECT customers.name, sum(orders.amount), orders.orderdate FROM orders LEFT JOIN customers ON orders.custkey = customers.c_custkey GROUP BY name, orderdate;
Refreshing a materialized view
In Databricks SQL, you have the option to set up automatic refresh for a materialized view based on a predefined schedule. This schedule can be configured during the creation of the materialized view using the SCHEDULE clause or added later using the ALTER VIEW statement. Once a schedule is established, a Databricks job is automatically created to handle the updates.
To review the schedule details at any time, you can use the DESCRIBE EXTENDED statement, which provides visibility into the configured schedule for the materialized view. This allows you to easily monitor and manage the automatic refresh schedule for your materialized view in Databricks SQL.