Session
Language-Agnostic UDF Protocol for Spark
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Enterprise Technology |
| Technologies | Databricks SQL, Unity Catalog |
| Skill Level | Advanced |
User-defined functions (UDFs) are a fundamental extension mechanism in Spark, enabling users to express logic that goes beyond built-in operators. However, the current UDF execution model is not enough for the multi-language architecture in Spark Connect. Existing implementations are tightly coupled to specific runtimes (e.g., PySpark), rely on ad-hoc communication protocols, and embed language-specific logic into planning and execution.This proposal aims to introduce a language-agnostic UDF execution model for Spark. It is built around three core ideas: (1) planning UDFs based on execution shape rather than programming language, (2) a structured execution IPC protocol and (3) a declarative worker specification to decouple the Spark engine from language runtimes.To provide a clean and extensible foundation for IPC-based UDF execution across languages, this proposal aims to standardize multi-language support and establish a sustainable direction for the UDF ecosystem.
Session Speakers
Haiyang Sun
/Senior Software Engineer
Databricks
Tian Gao
/Senior Software Engineer
Databricks