Bucketing in Spark SQL 2.3

Bucketing is an optimization technique in Spark SQL that uses buckets and bucketing columns to determine data partitioning. When applied properly bucketing can lead to join optimizations by avoiding shuffles (aka exchanges) of tables participating in the join. The talk will give you the necessary information so you can use bucketing to optimize Spark SQL structured queries.

Session hashtag: #SAISDev12

« back
About Jacek Laskowski

Jacek is an independent consultant who offers development and training services for Apache Spark (and Scala, sbt with a bit of Hadoop YARN, Apache Kafka, Apache Hive, Apache Mesos, Akka Actors/Stream/HTTP, and Docker). He leads Warsaw Scala Enthusiasts and Warsaw Spark meetups. The latest project is to get in-depth understanding of Apache Spark in https://jaceklaskowski.gitbooks.io/mastering-apache-spark/.