HomepageData + AI Summit 2022 Logo
Watch on demand

Goodbye Hell of Unions in Spark SQL

On Demand


  • Session


  • Virtual


  • Data Engineering


  • Intermediate


  • 40 min

Vue d'ensemble

It is known that applications, which heavily use Spark SQL union() operation, cause performance problems. The union() operation combines multiple rows into one table. When union() operation merges many Dataframes, the size of the generated Spark SQL planning tree will be huge while the Spark SQL code is small. The huge planning tree may lead to performance problems.
This talk reviews performance problems from the Spark SQL planning perspective and explains how to avoid the performance issues with common practices.

Session Speakers

Kazuaki Ishizaki

Senior Researcher


Visionnez les temps forts du Data+AI Summit

Watch on demand