Skip to main content

Announcing support for New UC Python UDF Features

Supercharge Your SQL UDFs with custom dependencies, batched execution, and UC Service Credentials

Announcing support for New UC Python UDF Features

Summary

  • Run UC Python UDFs with custom Python dependencies
  • UC Python UDFs support batched mode for faster and more flexible execution
  • Use Unity Catalog service credentials to access cloud services from UC Python UDFs

Unity Catalog Python user-defined functions (UC Python UDFs) are increasingly used in modern data warehousing, running millions of queries daily across thousands of organizations. These functions allow users to harness the full power of Python from any Unity Catalog-enabled compute, including clusters, SQL warehouses and DLT.

We are excited to announce several enhancements to UC Python UDFs that are now available in Public Preview on AWS, Azure, and GCP with Unity Catalog clusters running Databricks Runtime 16.3, SQL warehouses (2025.15), and Serverless notebooks and workflows:

  • Support for custom Python dependencies, installed from Unity Catalog Volumes or external sources.
  • Batch input mode, offering more flexibility and improved performance.
  • Secure access to external cloud services using Unity Catalog Service Credentials.

Each of these features unlocks new possibilities for working with data and external systems directly from SQL. Below, we’ll walk through the details and examples.

Using custom dependencies in UC Python UDFs

Users can now install and use custom Python dependencies in UC Python UDFs. You can install these packages from PyPI, Unity Catalog Volumes, and blob storage. The example function below installs the pycryptodome from PyPI to return SHA3-256 hashes:

With this feature, you can define stable Python environments, avoid boilerplate code, and bring the capabilities of UC Python UDFs closer to session-based PySpark UDFs. Dependency installations are available starting with Databricks Runtime 16.3, on SQL warehouses, and in Serverless notebooks and workflows.

Introducing Batch UC Python UDFs

UC Python UDFs now allow functions to operate on batches of data, similar to vectorized Python UDFs in PySpark. The new function interface offers enhanced flexibility and provides several benefits:

  • The batched execution gives users more flexibility: UDFs can keep state between batches, i.e., perform expensive initialization work once on startup.
  • UDFs leveraging vectorized operations on pandas series can improve performance compared to row-at-a-time execution.
  • As shown in the cloud function call example below, sending batched data to cloud services can be more cost-effective than invoking them one row at a time.

Batch UC Python UDFs, now available on AWS, Azure, and GCP, are also known as Pandas UDFs or Vectorized Python UDFs. They are introduced by marking a UC Python UDF with PARAMETER STYLE PANDAS and specifying a HANDLER function to be called by name. The handler function is a Python function that receives an iterator of pandas Series, where each pandas Series corresponds to one batch. The handler functions are compatible with the pandas_udf API.

As an example, consider the below UDF that calculates the population by state, based on a JSON object mapping that it downloaded on startup:

Unity Catalog Service Credential access

Users can now leverage Unity Catalog service credentials in Batch UC Python UDFs to efficiently and securely access external cloud services. This functionality allows users to interact with cloud services directly from SQL.

UC Service Credentials are governed objects in Unity Catalog. They can provide access to any cloud service, such as key-value stores, key management services, or cloud functions. UC Service credentials are available in all major clouds and are currently accessible from Batch UC Python UDFs. Support for normal UC Python UDFs will follow in the future.

Service credentials are available to Batch UC Python UDFs using the CREDENTIALS clause in the UDF definition (AWS, Azure, GCP).

Example: Calling a cloud function from Batch UC Python UDFs

In our example, we will call a cloud function from a Batch UC Python UDF. This functionality allows for seamless integration with existing functions and enables the use of any base container, programming language, or environment.

With Unity Catalog, we can implement effective governance of both Service Credential and UDF objects. In the figure above, Alice is the owner and definer of the UDF. Alice can grant EXECUTE permission for the UDF to Bob. When Bob calls the UDF, Unity Catalog Lakeguard will run the UDF with Alice’s service credential permissions while ensuring that Bob can not access the service credential directly. UDFs will use the defining user’s permissions to access the credentials.

While all three major clouds are supported, we will focus on AWS in this example. In the following, we will walk through the steps to create and call the Lambda function.

Creating a UC service credential

As a prerequisite, we must set up a UC Service Credential with the appropriate permissions to execute Lambda functions. For this, we follow the instructions to set up a service credential called mycredential. Additionally, we allow our role to invoke functions by attaching the AWSLambdaRole policy.

Creating a Lambda function

In the second step, we create an AWS Lambda function through the AWS UI. Our example Lambda HashValuesFunctionNode runs in nodejs20.x and computes a hash of its input data:

Invoking a Lambda from a Batch UC Python UDFs

In the third step, we can now write a Batch UC Python UDF that calls the Lambda function. The UDF below makes the service credentials available by specifying them in the CREDENTIALS clause. The UDF invokes the Lambda function for each input batch, calling cloud functions with an entire batch of data can be more cost-efficient than calling them row-wise. The example also demonstrates how to forward the invoking user’s name from Spark’s TaskContext to the Lambda function, which can be useful for attribution:

Get started today

Try out the Public Preview of Enhanced Python UDFs in Unity Catalog – to install dependencies, to leverage the batched input mode, or to use UC service credentials!

Join the UC Compute and Spark product and engineering team at the Data + AI Summit, June 9–12 at the Moscone Center in San Francisco! Get a first look at the latest innovations in data and AI governance and security. Register now to secure your spot!

Never miss a Databricks post

Subscribe to the categories you care about and get the latest posts delivered to your inbox