HomepageData + AI Summit 2022 Logo
Watch on demand

Goodbye Hell of Unions in Spark SQL

On Demand

Type

  • Session

Format

  • Virtual

Track

  • Data Engineering

Difficulty

  • Intermediate

Duration

  • 40 min

Überblick

It is known that applications, which heavily use Spark SQL union() operation, cause performance problems. The union() operation combines multiple rows into one table. When union() operation merges many Dataframes, the size of the generated Spark SQL planning tree will be huge while the Spark SQL code is small. The huge planning tree may lead to performance problems.
This talk reviews performance problems from the Spark SQL planning perspective and explains how to avoid the performance issues with common practices.

Session Speakers

Kazuaki Ishizaki

Senior Researcher

IBM

Das Beste des Data+AI Summits anzeigen

Watch on demand