Skip to main content
Platform blog

In this blog post, we will share how you can use Databricks SQL Materialized Views with Lakeview dashboards to deliver fresh data and insights to your business.

We recently announced the public preview of Lakeview dashboards on the Databricks Databricks Data Intelligence Platform. Lakeview dashboards represent a significant step forward in creating visualizations and reporting experiences for Databricks customers. They provide significant improvements in visualization and a simplified design experience that is optimized for sharing and distribution.

Lakeview dashboards run on the Databricks SQL (DBSQL) data warehouse. DBSQL lets you run all of your SQL and BI applications at scale with your tools of choice at a fraction of the cost of legacy cloud data warehouses.

Materialized Views help deliver fresh data to the business

Materialized views (MVs) are like regular views but are pre-computed and automatically kept up-to-date as the sources change. End-user queries and dashboards are fast because they are querying data that is precomputed instead of directly querying potentially massive data volumes in the source tables. We saw an improvement in dashboard performance in our internal use case that involved the reporting of Lakeview's usage data and event logs. The dashboard response time reduced significantly from over thirty seconds to just 1-2 seconds per query. Query speed improvements are heavily dependent on the MV query definition so users are encouraged to experiment on their own data.

Materialized views

Let's take a closer look at how they work and their benefits.

  1. First, MVs can quickly process complex transformations in a declarative way - they automatically handle updates/deletes and any changes to the source and can be defined for any query, making them ideal for ETL.
  2. Data flows faster and users get more up-to-date results because MVs incrementally refresh updates to their sources, which avoids the need to rebuild the view when new data arrives, lowering overall compute time.
  3. End-user queries and dashboards are faster because data is pre-computed - it is much faster to query pre-computed data vs. querying the massive data volumes in the base tables.

MVs can also be used in conjunction with Streaming Tables for incrementally ingesting data into the Databricks SQL data warehouse, empowering your users to build end-to-end ETL pipelines in a streaming fashion.

Step-by-step Guide - Using MVs to speed up Lakeview dashboards

In the following guide, we will provide step-by-step instructions on how to leverage MVs to speed up your Lakeview dashboards.

MVs are best used to summarize large source tables and complex transformations, for example computing aggregates from raw data.

Step 1: Create a materialized view

Let's say you need to analyze the total revenue generated from orders, segmented by the date and priority level of each order, which has to be refreshed hourly. Your typical SQL query would be:

SELECT
     o_orderdate AS order_date,
     o_orderpriority AS order_priority,
     sum(o_totalprice) AS total_price
FROM
     demo.dss_acme.orders
WHERE
     o_orderdate > '2023-12-01'
AND
     o_orderdate < '2023-12-31'
GROUP BY ALL

You could use the above query in your lakeview dashboard, but you would still be missing the data freshness requirement: you still need to find a way to run the query every hour. Also, if the data volume is large, creating a new table may take longer than one hour in which case you could never meet your data freshness requirement. Using a materialized view makes it easy to set a schedule and because the MV's may be incrementally refreshed, the updates can be very fast as compared to a full recompute of the table.

To use this query in a materialized view, simply wrap the query in a CREATE statement for a materialized view; put it in a DBSQL Query; and run it, like so:

create materialized view if not exists
  demo.dss_acme.orders_mw
  - - - Refresh hourly
  schedule cron '0 0 * * * ?'
as (
SELECT
     o_orderdate AS order_date,
     o_orderpriority AS order_priority,
     sum(o_totalprice) AS total_price
FROM
     demo.dss_acme.orders
WHERE
     o_orderdate > '2023-12-01'
AND
     o_orderdate < '2023-12-31'
GROUP BY ALL
);

The schedule operator tells the MV to update periodically, in this case the cron syntax specifies the MV to refresh every hour. Without that part, your dashboard would never pick up new data that has arrived since you created the materialized view. See the CREATE MATERIALIZED VIEW and quartz cron syntax documentation for more information. You may also see the query definition and other details about your MV in the catalog explorer.

Step 2: Give others access to your materialized view

You need to give others access to your materialized view so that they can use it when they're looking at your dashboard by following these steps:

  1. Give USE SCHEMA access to the catalog that the materialized view is in by running the following command:
    GRANT USAGE ON SCHEMA demo.dss_acme TO user_or_role;
  2. Grant SELECT privileges on the MV with the following command:
    GRANT SELECT ON demo.dss_acme.orders_mw TO user_or_role;

You may alternatively use the Databricks Catalog Explorer UI to manage permissions instead of running SQL queries. See the documentation for more information.

Step 3: Use the materialized view in Lakeview dashboard

Now you have an MV that has been created to pre-compute your original Lakeview dashboard query. The last step is to update the existing Lakeview dashboard to replace the SQL to query this new MV instead of the original one.

Go back to your Lakeview dashboard to update the code as follows:

select * from demo.dss_acme.orders_mw;

Step 4: Enjoy that faster dashboard!

…and that should be it! If everything worked correctly, your dashboard should be fast now!

Dashboard

Conclusion

Lakeview dashboards are a powerful visualization and reporting tool on the Databricks Data Intelligence Platform. Materialized Views are a new capability that can be used to significantly improve end-user response times for Lakeview dashboards. With a few clicks, you'll be able to quickly create a faster end-user experience by combining MVs with Lakeview.

You must be enrolled in the public preview to create and manage materialized views. You may request access to the public preview of Materialized Views by clicking on this link. Find documentation for Materialized Views and Lakeview Dashboards at these links.

Try Databricks for free

Related posts

Platform blog

Introducing Materialized Views and Streaming Tables for Databricks SQL

We are thrilled to announce that materialized views and streaming tables are now publicly available in Databricks SQL on AWS and Azure. Streaming...
Platform blog

Announcing the Public Preview of Lakeview Dashboards!

We are excited to announce the public preview of the next generation of Databricks SQL dashboards, dubbed Lakeview dashboards. Available today, this new...
See all Platform Blog posts