Session

What’s New in PySpark: TVFs, Subqueries, Plots, and Profilers

Overview

ExperienceIn Person
TypeBreakout
TrackData Engineering and Streaming
IndustryEnterprise Technology, Professional Services, Financial Services
TechnologiesApache Spark
Skill LevelIntermediate
Duration40 min

PySpark’s DataFrame API is evolving to support more expressive and modular workflows. In this session, we’ll introduce two powerful additions: table-valued functions (TVFs) and the new subquery API. You’ll learn how to define custom TVFs using Python User-Defined Table Functions (UDTFs), including support for polymorphism, and how subqueries can simplify complex logic.

 

We’ll also explore how lateral joins connect these features, followed by practical tools for the PySpark developer experience—such as plotting, profiling, and a preview of upcoming capabilities like UDF logging and a Python-native data source API.

 

Whether you're building production pipelines or extending PySpark itself, this talk will help you take full advantage of the latest features in the PySpark ecosystem.

Session Speakers

Takuya Ueshin

/Sr. Software Engineer
Databricks

Xinrong Meng

/Senior Software Engineer
Databricks