What’s New in PySpark: TVFs, Subqueries, Plots, and Profilers
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Enterprise Technology, Professional Services, Financial Services |
Technologies | Apache Spark |
Skill Level | Intermediate |
Duration | 40 min |
PySpark’s DataFrame API is evolving to support more expressive and modular workflows. In this session, we’ll introduce two powerful additions: table-valued functions (TVFs) and the new subquery API. You’ll learn how to define custom TVFs using Python User-Defined Table Functions (UDTFs), including support for polymorphism, and how subqueries can simplify complex logic.
We’ll also explore how lateral joins connect these features, followed by practical tools for the PySpark developer experience—such as plotting, profiling, and a preview of upcoming capabilities like UDF logging and a Python-native data source API.
Whether you're building production pipelines or extending PySpark itself, this talk will help you take full advantage of the latest features in the PySpark ecosystem.
Session Speakers
Takuya Ueshin
/Sr. Software Engineer
Databricks
Xinrong Meng
/Senior Software Engineer
Databricks