Unity Catalog Python user-defined functions (UC Python UDFs) are increasingly used in modern data warehousing, running millions of queries daily across thousands of organizations. These functions allow users to harness the full power of Python from any Unity Catalog-enabled compute, including clusters, SQL warehouses and DLT.
We are excited to announce several enhancements to UC Python UDFs that are now available in Public Preview on AWS, Azure, and GCP with Unity Catalog clusters running Databricks Runtime 16.3, SQL warehouses (2025.15), and Serverless notebooks and workflows:
Each of these features unlocks new possibilities for working with data and external systems directly from SQL. Below, we’ll walk through the details and examples.
Users can now install and use custom Python dependencies in UC Python UDFs. You can install these packages from PyPI, Unity Catalog Volumes, and blob storage. The example function below installs the pycryptodome from PyPI to return SHA3-256 hashes:
With this feature, you can define stable Python environments, avoid boilerplate code, and bring the capabilities of UC Python UDFs closer to session-based PySpark UDFs. Dependency installations are available starting with Databricks Runtime 16.3, on SQL warehouses, and in Serverless notebooks and workflows.
UC Python UDFs now allow functions to operate on batches of data, similar to vectorized Python UDFs in PySpark. The new function interface offers enhanced flexibility and provides several benefits:
Batch UC Python UDFs, now available on AWS, Azure, and GCP, are also known as Pandas UDFs or Vectorized Python UDFs. They are introduced by marking a UC Python UDF with PARAMETER STYLE PANDAS and specifying a HANDLER function to be called by name. The handler function is a Python function that receives an iterator of pandas Series, where each pandas Series corresponds to one batch. The handler functions are compatible with the pandas_udf API.
As an example, consider the below UDF that calculates the population by state, based on a JSON object mapping that it downloaded on startup:
Users can now leverage Unity Catalog service credentials in Batch UC Python UDFs to efficiently and securely access external cloud services. This functionality allows users to interact with cloud services directly from SQL.
UC Service Credentials are governed objects in Unity Catalog. They can provide access to any cloud service, such as key-value stores, key management services, or cloud functions. UC Service credentials are available in all major clouds and are currently accessible from Batch UC Python UDFs. Support for normal UC Python UDFs will follow in the future.
Service credentials are available to Batch UC Python UDFs using the CREDENTIALS clause in the UDF definition (AWS, Azure, GCP).
In our example, we will call a cloud function from a Batch UC Python UDF. This functionality allows for seamless integration with existing functions and enables the use of any base container, programming language, or environment.
With Unity Catalog, we can implement effective governance of both Service Credential and UDF objects. In the figure above, Alice is the owner and definer of the UDF. Alice can grant EXECUTE permission for the UDF to Bob. When Bob calls the UDF, Unity Catalog Lakeguard will run the UDF with Alice’s service credential permissions while ensuring that Bob can not access the service credential directly. UDFs will use the defining user’s permissions to access the credentials.
While all three major clouds are supported, we will focus on AWS in this example. In the following, we will walk through the steps to create and call the Lambda function.
As a prerequisite, we must set up a UC Service Credential with the appropriate permissions to execute Lambda functions. For this, we follow the instructions to set up a service credential called mycredential
. Additionally, we allow our role to invoke functions by attaching the AWSLambdaRole policy.
In the second step, we create an AWS Lambda function through the AWS UI. Our example Lambda HashValuesFunctionNode
runs in nodejs20.x
and computes a hash of its input data:
In the third step, we can now write a Batch UC Python UDF that calls the Lambda function. The UDF below makes the service credentials available by specifying them in the CREDENTIALS clause. The UDF invokes the Lambda function for each input batch, calling cloud functions with an entire batch of data can be more cost-efficient than calling them row-wise. The example also demonstrates how to forward the invoking user’s name from Spark’s TaskContext to the Lambda function, which can be useful for attribution:
Try out the Public Preview of Enhanced Python UDFs in Unity Catalog – to install dependencies, to leverage the batched input mode, or to use UC service credentials!
Join the UC Compute and Spark product and engineering team at the Data + AI Summit, June 9–12 at the Moscone Center in San Francisco! Get a first look at the latest innovations in data and AI governance and security. Register now to secure your spot!