val rangeAlone = spark.range(5)
scala> rangeAlone.rdd.getNumPartitions
res0: Int = 8
// Repartition the records
val withRepartition = rangeAlone.repartition(numPartitions = 5)
scala> withRepartition.rdd.getNumPartitions
res1: Int = 5
scala> withRepartition.explain(true)
== Parsed Logical Plan ==
Repartition 5, true
+- Range (0, 5, step=1, splits=Some(8))
// ...
== Physical Plan ==
Exchange RoundRobinPartitioning(5)
+- *Range (0, 5, step=1, splits=Some(8))
// Coalesce the records
val withCoalesce = rangeAlone.coalesce(numPartitions = 5)
scala> withCoalesce.explain(true)
== Parsed Logical Plan ==
Repartition 5, false
+- Range (0, 5, step=1, splits=Some(8))
// ...
== Physical Plan ==
Coalesce 5
+- *Range (0, 5, step=1, splits=Some(8))
Repartition Logical Operators — Repartition and RepartitionByExpression
Repartition and RepartitionByExpression (repartition operations in short) are unary logical operators that create a new RDD
that has exactly numPartitions partitions.
Note
|
RepartitionByExpression is also called distribute operator.
|
Repartition is the result of coalesce or repartition (with no partition expressions defined) operators.
RepartitionByExpression is the result of repartition operator with explicit partition expressions defined and SQL’s DISTRIBUTE BY clause.
// RepartitionByExpression
// 1) Column-based partition expression only
scala> rangeAlone.repartition(partitionExprs = 'id % 2).explain(true)
== Parsed Logical Plan ==
'RepartitionByExpression [('id % 2)], 200
+- Range (0, 5, step=1, splits=Some(8))
// ...
== Physical Plan ==
Exchange hashpartitioning((id#10L % 2), 200)
+- *Range (0, 5, step=1, splits=Some(8))
// 2) Explicit number of partitions and partition expression
scala> rangeAlone.repartition(numPartitions = 2, partitionExprs = 'id % 2).explain(true)
== Parsed Logical Plan ==
'RepartitionByExpression [('id % 2)], 2
+- Range (0, 5, step=1, splits=Some(8))
// ...
== Physical Plan ==
Exchange hashpartitioning((id#10L % 2), 2)
+- *Range (0, 5, step=1, splits=Some(8))
Repartition
and RepartitionByExpression
logical operators are described by:
Note
|
BasicOperators strategy maps Repartition to ShuffleExchange (with RoundRobinPartitioning partitioning scheme) or CoalesceExec physical operators per shuffle — enabled or not, respectively.
|
Note
|
BasicOperators strategy maps RepartitionByExpression to ShuffleExchange physical operator with HashPartitioning partitioning scheme.
|
Repartition Operation Optimizations
-
CollapseRepartition logical optimization collapses adjacent repartition operations.
-
Repartition operations allow FoldablePropagation and PushDownPredicate logical optimizations to "push through".
-
PropagateEmptyRelation logical optimization may result in an empty LocalRelation for repartition operations.