compute(split: Partition, context: TaskContext): Iterator[InternalRow]
ShuffledRowRDD
ShuffledRowRDD is a specialized RDD of InternalRows.
|
Note
|
ShuffledRowRDD looks like ShuffledRDD, and the difference is in the type of the values to process, i.e. InternalRow and (K, C) key-value pairs, respectively.
|
ShuffledRowRDD takes a ShuffleDependency (of integer keys and InternalRow values).
|
Note
|
The dependency property is mutable and is of type ShuffleDependency[Int, InternalRow, InternalRow].
|
ShuffledRowRDD takes an optional specifiedPartitionStartIndices collection of integers that is the number of post-shuffle partitions. When not specified, the number of post-shuffle partitions is managed by the Partitioner of the input ShuffleDependency.
|
Note
|
Post-shuffle partition is…FIXME |
| Name | Description |
|---|---|
|
A single-element collection with |
|
CoalescedPartitioner (with the Partitioner of the |
numPreShufflePartitions Property
|
Caution
|
FIXME |
Computing Partition (in TaskContext) — compute Method
|
Note
|
compute is a part of RDD contract to compute a given partition in a TaskContext.
|
Internally, compute makes sure that the input split is a ShuffledRowRDDPartition. It then requests ShuffleManager for a ShuffleReader to read InternalRows for the split.
|
Note
|
compute uses SparkEnv to access the current ShuffleManager.
|
|
Note
|
compute uses ShuffleHandle (of ShuffleDependency dependency) and the pre-shuffle start and end partition offsets.
|
Getting Placement Preferences of Partition — getPreferredLocations Method
getPreferredLocations(partition: Partition): Seq[String]
|
Note
|
getPreferredLocations is a part of RDD contract to specify placement preferences (aka preferred task locations), i.e. where tasks should be executed to be as close to the data as possible.
|
Internally, getPreferredLocations requests MapOutputTrackerMaster for the preferred locations of the input partition (for the single ShuffleDependency).
|
Note
|
getPreferredLocations uses SparkEnv to access the current MapOutputTrackerMaster (which runs on the driver).
|
CoalescedPartitioner
|
Caution
|
FIXME |
ShuffledRowRDDPartition
|
Caution
|
FIXME |