Managing repetitive SQL tasks—like cleaning data, updating business rules, or running batch logic can be tedious and error-prone if you’re copying and pasting code.
Now, SQL Stored Procedures in Databricks let you store that logic once, run it whenever you need, and keep it governed under Unity Catalog.
Whether you're cleaning up data before analytics, updating tables based on business criteria, or moving workloads from a legacy enterprise data warehouse, stored procedures make the process simpler, more consistent, and easier to maintain.
Databricks supports open standards and interoperability, avoiding proprietary or vendor-specific implementations. SQL Stored Procedures follow the ANSI/PSM SQL standard and will be contributed to open source Apache Spark™.
Procedures are widely used in administrative tasks, data management, and ETL workflows—especially in enterprise data warehouses (EDWs). For customers moving from EDWs to Databricks, existing stored procedures can be migrated without rewriting, making the transition simpler. And as always, the best data warehouse is a lakehouse.
For one of our critical use cases around customer segmentation, we leveraged SQL Stored Procedures with DBSQL to achieve better performance, scalability, and cost efficiency. Being familiar with SQL helped us implement and deploy the solution to production in a very short time. Using Stored Procedures has enabled us to manage complex logic more effectively while keeping the overall architecture streamlined and maintainable. —SambaSiva Rao, Sr. Data Engineer/Architect, ClicTechnologies
Across data processing workflows, customers can struggle to maintain consistency and performance of repetitive tasks and complex logic. Stored procedures are a great approach in these cases, ensuring data is processed in a consistent, standardized way, and performance is optimal.
For data cleaning tasks, procedures can apply transformations such as converting inconsistent date formats into a standardized structure, trimming leading and trailing whitespaces from text fields, and replacing or correcting erroneous values. This ensures that your data is prepared for downstream analysis. See the detailed ETL example below.
On the data management side, stored procedures can efficiently update table values based on defined business rules—such as flagging outdated records, recalculating fields, or synchronizing data across related tables. By encapsulating these operations into procedures, teams can ensure consistent execution, reduce manual intervention, and improve data quality at scale. See the detailed data management example below, using stored procedures to update a loyalty/membership program.
So what are Procedures? They are pre-compiled collections of SQL statements that allow a user to manage their SQL logic into a single, reusable unit. Procedures are stored in Unity Catalog, meaning they are governed and fully encapsulate permissions. When a stored procedure is called, the database executes these pre-defined operations, offering the benefit of enhanced security, simplified maintenance of complex workloads, and the potential for improved performance.
There are 5 core commands that support procedures: CREATE, CALL, DESCRIBE, SHOW, and DROP.
When creating a procedure, you can use several parameter types to control the input and output.
These parameters can be assigned to:
The logic encapsulated within a SQL Stored Procedure is built on top of SQL Scripting. A stored procedure can be considered as a reusable script with parameters, governed by Unity Catalog. You can read about Scripting in these two introductory blogs:
Nested and recursive procedure calls are supported, meaning that customers can organize their units of work or business logic conveniently into separate procedures, making the entire SQL execution flow more modular. This improves readability and maintenance.
Procedures are grouped with Functions in Unity Catalog in the UI. However, procedures and functions, while letting you reuse SQL logic, serve different purposes.
A function is used to return a value or a table. It must be used within a SQL query and cannot include dynamic SQL or procedural logic. A procedure, by contrast, is used to execute a sequence of SQL statements. It can include control flow, variables, loops, and dynamic SQL using IDENTIFIER and EXECUTE IMMEDIATE. You call a procedure as a standalone command, typically to perform a task or workflow.
Now that we’ve covered the capabilities of SQL Stored Procedures, let’s explore some examples to demonstrate their value and the problems they help solve.
You can use this notebook to follow along - it contains all examples from this post, as well as data preparation commands.
If you follow the typical medallion architecture, you know moving data from Bronze to Silver (or Silver to Gold) can require cleaning, transforming, aggregating, and formatting data. Stored Procedures are great for managing repetitive processes like these within an ETL workflow.
In this ETL scenario, a Procedure is used to:
Procedures like this help standardize data products. Any user of this procedure will produce data in the same structure, regardless of the date range or point-of-sale. This is a primary benefit of reusing code. Code reuse will naturally be less error prone as the same logic gets executed every time.
Data management is the practice of ensuring your data is accurate, consistent, and efficiently accessed—qualities that are essential for any organization aiming to make data-driven decisions. Without strong data management, even the most advanced analytics or reporting efforts can be undermined by unreliable or inconsistent information.
Let’s examine one example found across commercial industries where it is common for a business to establish a loyalty program to provide customers with benefits based on their tier. Airlines have frequent flier programs and most retail franchises have rewards programs, etc. As customers fly more with the same airline or purchase more items from the same franchise, customers earn more benefits.
Here is an example of how Stored Procedures can be used to manage and update a standard retail loyalty program. There are two procedures used to manage the customer loyalty tiers, one for updating a specific customer loyalty tier for the provided customerID, and the other one which updates the customer loyalty tier for all customers from a provided country.
Now let’s use the created procedure to update customer loyalty tiers for customers from Serbia, Germany and Canada, and then check the updated records:
The previous query produces the following result:
By encapsulating the update tier logic into respective procedures, we are avoiding code duplication while also removing complexity from the caller, who only needs to invoke the procedure with the appropriate procedure parameters.
With SQL Stored Procedures now in DBSQL, customers can continue to migrate legacy enterprise data warehouse workloads to the lakehouse. Based on customer feedback, there are several key capabilities we want to address as we move toward GA:
Customers who want to share feedback or requests related to SQL Scripting and Procedures can do so through this form.
Two other important SQL constructs are closely tied to Stored Procedures: Temporary Tables and Multi-Statement (multi-table) Transactions. Both of these efforts are currently in Private Preview and accepting nominations. If you are interested, reach out to your Databricks account team.
Whether you are an existing Databricks user or migrating from another Data Warehouse, SQL Stored Procedures are a capability you should use to simplify how you manage complex SQL workflows. Get started with SQL Stored Procedures by reading the Databricks documentation.
To learn more about Databricks SQL, visit our website or read the documentation. You can also check out the product tour for Databricks SQL. If you want to migrate your existing warehouse to a high-performance, serverless data warehouse with a great user experience and lower total cost, then Databricks SQL is the solution — try it for free.