Skip to main content

Introducing SQL Stored Procedures in Databricks

The best data warehouse is an open lakehouse

SQL Stored Procedures blog OG

Summary

  • Reusable SQL logic: Store and run complex logic with parameters for consistent, repeatable results.
  • Easy migration: Move stored procedures from existing enterprise data warehouses without rewriting.
  • Enterprise-ready: Fully governed by Unity Catalog, ANSI-compliant, and interoperable with open source

Managing repetitive SQL tasks—like cleaning data, updating business rules, or running batch logic can be tedious and error-prone if you’re copying and pasting code. 

Now, SQL Stored Procedures in Databricks let you store that logic once, run it whenever you need, and keep it governed under Unity Catalog. 

Whether you're cleaning up data before analytics, updating tables based on business criteria, or moving workloads from a legacy enterprise data warehouse, stored procedures make the process simpler, more consistent, and easier to maintain.

Databricks supports open standards and interoperability, avoiding proprietary or vendor-specific implementations. SQL Stored Procedures follow the ANSI/PSM SQL standard and will be contributed to open source Apache Spark™.

Procedures are widely used in administrative tasks, data management, and ETL workflows—especially in enterprise data warehouses (EDWs). For customers moving from EDWs to Databricks, existing stored procedures can be migrated without rewriting, making the transition simpler. And as always, the best data warehouse is a lakehouse.

For one of our critical use cases around customer segmentation, we leveraged SQL Stored Procedures with DBSQL to achieve better performance, scalability, and cost efficiency. Being familiar with SQL helped us implement and deploy the solution to production in a very short time. Using Stored Procedures has enabled us to manage complex logic more effectively while keeping the overall architecture streamlined and maintainable. —SambaSiva Rao, Sr. Data Engineer/Architect, ClicTechnologies

Overview of SQL Stored Procedures

What are Stored Procedures?

Across data processing workflows, customers can struggle to maintain consistency and performance of repetitive tasks and complex logic. Stored procedures are a great approach in these cases, ensuring data is processed in a consistent, standardized way, and performance is optimal. 

For data cleaning tasks, procedures can apply transformations such as converting inconsistent date formats into a standardized structure, trimming leading and trailing whitespaces from text fields, and replacing or correcting erroneous values. This ensures that your data is prepared for downstream analysis. See the detailed ETL example below.

On the data management side, stored procedures can efficiently update table values based on defined business rules—such as flagging outdated records, recalculating fields, or synchronizing data across related tables. By encapsulating these operations into procedures, teams can ensure consistent execution, reduce manual intervention, and improve data quality at scale. See the detailed data management example below, using stored procedures to update a loyalty/membership program.

So what are Procedures? They are pre-compiled collections of SQL statements that allow a user to manage their SQL logic into a single, reusable unit. Procedures are stored in Unity Catalog, meaning they are governed and fully encapsulate permissions. When a stored procedure is called, the database executes these pre-defined operations, offering the benefit of enhanced security, simplified maintenance of complex workloads, and the potential for improved performance.

What is supported in Public Preview?

There are 5 core commands that support procedures: CREATE, CALL, DESCRIBE, SHOW, and DROP.

  • CREATE PROCEDURE: Define and store a new stored procedure within Unity Catalog. It specifies the procedure's name, parameters (if any), and the SQL statements to be executed when the procedure is called.
  • CALL PROCEDURE: Execute a previously created stored procedure; passing in any required parameters.
  • DESCRIBE PROCEDURE: Return the basic metadata about an existing procedure, such as its name and parameters. With EXTENDED, the description will include additional metadata including owner, datetime it was created, security type, etc.
  • SHOW PROCEDURES: List all stored procedures available within the current catalog schema.
  • DROP PROCEDURE: Delete an existing procedure from storage. 

When creating a procedure, you can use several parameter types to control the input and output. 

  • `IN` parameter: Used to pass values into a procedure as input. For example, you might pass in a customer ID to retrieve or process only that customer's data. The procedure can read but not modify these values.
  • `OUT` parameter: Used for returning values from a procedure, after being assigned. For example, you might pass in a customer ID, and return their account status or computed sales total, for further processing outside the procedure.
  • `INOUT` parameter: Serves a dual purpose, allowing a value to be passed into a procedure, modified within it, and the updated value returned. It functions as both an input and an output. 

These parameters can be assigned to:

  • Local variables, declared within a script/procedure
  • Session variables, declared within the session, and even outside of the script/procedure

The logic encapsulated within a SQL Stored Procedure is built on top of SQL Scripting. A stored procedure can be considered as a reusable script with parameters, governed by Unity Catalog. You can read about Scripting in these two introductory blogs:

Nested and recursive procedure calls are supported, meaning that customers can organize their units of work or business logic conveniently into separate procedures, making the entire SQL execution flow more modular. This improves readability and maintenance.

How are Procedures different from Functions?

Procedures are grouped with Functions in Unity Catalog in the UI. However, procedures and functions, while letting you reuse SQL logic, serve different purposes.

A function is used to return a value or a table. It must be used within a SQL query and cannot include dynamic SQL or procedural logic. A procedure, by contrast, is used to execute a sequence of SQL statements. It can include control flow, variables, loops, and dynamic SQL using IDENTIFIER and EXECUTE IMMEDIATE. You call a procedure as a standalone command, typically to perform a task or workflow.

Examples of using SQL Stored Procedures

Now that we’ve covered the capabilities of SQL Stored Procedures, let’s explore some examples to demonstrate their value and the problems they help solve.

You can use this notebook to follow along - it contains all examples from this post, as well as data preparation commands.

ETL – Data Cleaning: Preparing Silver or Gold layer tables

If you follow the typical medallion architecture, you know moving data from Bronze to Silver (or Silver to Gold) can require cleaning, transforming, aggregating, and formatting data. Stored Procedures are great for managing repetitive processes like these within an ETL workflow. 

In this ETL scenario, a Procedure is used to:

  • Load data from raw into a fact table
  • Select data based on date range and origin of sale (web, mobile, point-of-sale, 3p-vendor)
  • Clean and format the data by converting dates to a specific format and removing whitespaces
  • Once clean, loading the data into a ‘clean’ table
  • Add a log record based on timestamp, to-from dates, and origin of sale
  • Using the Procedure, look up the mobileApp sales for June 2025

Procedures like this help standardize data products. Any user of this procedure will produce data in the same structure, regardless of the date range or point-of-sale. This is a primary benefit of reusing code. Code reuse will naturally be less error prone as the same logic gets executed every time.

Data Management: Update a table based on business criteria

Data management is the practice of ensuring your data is accurate, consistent, and efficiently accessed—qualities that are essential for any organization aiming to make data-driven decisions. Without strong data management, even the most advanced analytics or reporting efforts can be undermined by unreliable or inconsistent information. 

Let’s examine one example found across commercial industries where it is common for a business to establish a loyalty program to provide customers with benefits based on their tier. Airlines have frequent flier programs and most retail franchises have rewards programs, etc. As customers fly more with the same airline or purchase more items from the same franchise, customers earn more benefits.

Here is an example of how Stored Procedures can be used to manage and update a standard retail loyalty program. There are two procedures used to manage the customer loyalty tiers, one for updating a specific customer loyalty tier for the provided customerID, and the other one which updates the customer loyalty tier for all customers from a provided country.

Now let’s use the created procedure to update customer loyalty tiers for customers from Serbia, Germany and Canada, and then check the updated records:

The previous query produces the following result:

By encapsulating the update tier logic into respective procedures, we are avoiding code duplication while also removing complexity from the caller, who only needs to invoke the procedure with the appropriate procedure parameters.

What’s next?

With SQL Stored Procedures now in DBSQL, customers can continue to migrate legacy enterprise data warehouse workloads to the lakehouse. Based on customer feedback, there are several key capabilities we want to address as we move toward GA:

  • SQL Stored Procedures in Apache Spark™: Support for all SQL Procedures in open source
  • Support for SQL SECURITY DEFINER: Allow customers to execute procedure logic with the permissions of the procedure creator (definer)

Customers who want to share feedback or requests related to SQL Scripting and Procedures can do so through this form.

Two other important SQL constructs are closely tied to Stored Procedures: Temporary Tables and Multi-Statement (multi-table) Transactions. Both of these efforts are currently in Private Preview and accepting nominations. If you are interested, reach out to your Databricks account team.

Whether you are an existing Databricks user or migrating from another Data Warehouse, SQL Stored Procedures are a capability you should use to simplify how you manage complex SQL workflows. Get started with SQL Stored Procedures by reading the Databricks documentation.

To learn more about Databricks SQL, visit our website or read the documentation. You can also check out the product tour for Databricks SQL. If you want to migrate your existing warehouse to a high-performance, serverless data warehouse with a great user experience and lower total cost, then Databricks SQL is the solution — try it for free.

Never miss a Databricks post

Subscribe to the categories you care about and get the latest posts delivered to your inbox