Partitioning — Specification of Physical Operator’s Output Partitions

Partitioning is specification that describes how a physical operator's output is split across partitions.

package org.apache.spark.sql.catalyst.plans.physical

sealed trait Partitioning {
  val numPartitions: Int
  def satisfies(required: Distribution): Boolean
  def compatibleWith(other: Partitioning): Boolean
  def guarantees(other: Partitioning): Boolean
}
Table 1. Partitioning Contract (in alphabetical order)
Method Description

compatibleWith

Used mainly in Partitioning.allCompatible

guarantees

Used mainly when EnsureRequirements physical preparation rule enforces partition requirements of a physical operator

numPartitions

Number of partitions that the data is split across

Used in:

satisfies

Used mainly when EnsureRequirements physical preparation rule enforces partition requirements of a physical operator

Table 2. Partitioning Schemes (Partitioning’s Available Implementations) and Their Properties
Partitioning compatibleWith guarantees numPartitions satisfies

BroadcastPartitioning

BroadcastPartitioning with the same BroadcastMode

Exactly the same BroadcastPartitioning

1

BroadcastDistribution with the same BroadcastMode

HashPartitioning

  • clustering expressions

  • numPartitions

HashPartitioning (when their underlying expressions are semantically equal, i.e. deterministic and canonically equal)

HashPartitioning (when their underlying expressions are semantically equal, i.e. deterministic and canonically equal)

Input numPartitions

  • UnspecifiedDistribution

  • ClusteredDistribution with all the hashing expressions included in clustering expressions

PartitioningCollection

  • partitionings

Any Partitioning that is compatible with one of the input partitionings

Any Partitioning that is guaranteed by any of the input partitionings

Number of partitions of the first Partitioning in the input partitionings

Any Distribution that is satisfied by any of the input partitionings

RangePartitioning

  • ordering collection of SortOrder

  • numPartitions

RangePartitioning (when semantically equal, i.e. underlying expressions are deterministic and canonically equal)

RangePartitioning (when semantically equal, i.e. underlying expressions are deterministic and canonically equal)

Input numPartitions

  • UnspecifiedDistribution

  • OrderedDistribution with requiredOrdering that matches the input ordering

  • ClusteredDistribution with all the children of the input ordering semantically equal to one of the clustering expressions

RoundRobinPartitioning

  • numPartitions

Always negative

Always negative

Input numPartitions

UnspecifiedDistribution

SinglePartition

Any Partitioning with exactly one partition

Any Partitioning with exactly one partition

1

Any Distribution except BroadcastDistribution

UnknownPartitioning

  • numPartitions

Always negative

Always negative

Input numPartitions

UnspecifiedDistribution

results matching ""

    No results matching ""