Repartition Logical Operators — Repartition and RepartitionByExpression

Repartition and RepartitionByExpression (repartition operations in short) are unary logical operators that create a new RDD that has exactly numPartitions partitions.

Note
RepartitionByExpression is also called distribute operator.

Repartition is the result of coalesce or repartition (with no partition expressions defined) operators.

val rangeAlone = spark.range(5)

scala> rangeAlone.rdd.getNumPartitions
res0: Int = 8

// Repartition the records

val withRepartition = rangeAlone.repartition(numPartitions = 5)

scala> withRepartition.rdd.getNumPartitions
res1: Int = 5

scala> withRepartition.explain(true)
== Parsed Logical Plan ==
Repartition 5, true
+- Range (0, 5, step=1, splits=Some(8))

// ...

== Physical Plan ==
Exchange RoundRobinPartitioning(5)
+- *Range (0, 5, step=1, splits=Some(8))

// Coalesce the records

val withCoalesce = rangeAlone.coalesce(numPartitions = 5)
scala> withCoalesce.explain(true)
== Parsed Logical Plan ==
Repartition 5, false
+- Range (0, 5, step=1, splits=Some(8))

// ...

== Physical Plan ==
Coalesce 5
+- *Range (0, 5, step=1, splits=Some(8))

RepartitionByExpression is the result of repartition operator with explicit partition expressions defined and SQL’s DISTRIBUTE BY clause.

// RepartitionByExpression
// 1) Column-based partition expression only
scala> rangeAlone.repartition(partitionExprs = 'id % 2).explain(true)
== Parsed Logical Plan ==
'RepartitionByExpression [('id % 2)], 200
+- Range (0, 5, step=1, splits=Some(8))

// ...

== Physical Plan ==
Exchange hashpartitioning((id#10L % 2), 200)
+- *Range (0, 5, step=1, splits=Some(8))

// 2) Explicit number of partitions and partition expression
scala> rangeAlone.repartition(numPartitions = 2, partitionExprs = 'id % 2).explain(true)
== Parsed Logical Plan ==
'RepartitionByExpression [('id % 2)], 2
+- Range (0, 5, step=1, splits=Some(8))

// ...

== Physical Plan ==
Exchange hashpartitioning((id#10L % 2), 2)
+- *Range (0, 5, step=1, splits=Some(8))

Repartition and RepartitionByExpression logical operators are described by:

  • shuffle flag

  • target number of partitions

Note
BasicOperators strategy maps Repartition to ShuffleExchange (with RoundRobinPartitioning partitioning scheme) or CoalesceExec physical operators per shuffle — enabled or not, respectively.
Note
BasicOperators strategy maps RepartitionByExpression to ShuffleExchange physical operator with HashPartitioning partitioning scheme.

Repartition Operation Optimizations

  1. CollapseRepartition logical optimization collapses adjacent repartition operations.

  2. Repartition operations allow FoldablePropagation and PushDownPredicate logical optimizations to "push through".

  3. PropagateEmptyRelation logical optimization may result in an empty LocalRelation for repartition operations.

results matching ""

    No results matching ""