compute(split: Partition, context: TaskContext): Iterator[InternalRow]
ShuffledRowRDD
ShuffledRowRDD
is a specialized RDD of InternalRows.
Note
|
ShuffledRowRDD looks like ShuffledRDD, and the difference is in the type of the values to process, i.e. InternalRow and (K, C) key-value pairs, respectively.
|
ShuffledRowRDD
takes a ShuffleDependency (of integer keys and InternalRow values).
Note
|
The dependency property is mutable and is of type ShuffleDependency[Int, InternalRow, InternalRow] .
|
ShuffledRowRDD
takes an optional specifiedPartitionStartIndices
collection of integers that is the number of post-shuffle partitions. When not specified, the number of post-shuffle partitions is managed by the Partitioner of the input ShuffleDependency
.
Note
|
Post-shuffle partition is…FIXME |
Name | Description |
---|---|
|
A single-element collection with |
|
CoalescedPartitioner (with the Partitioner of the |
numPreShufflePartitions
Property
Caution
|
FIXME |
Computing Partition (in TaskContext
) — compute
Method
Note
|
compute is a part of RDD contract to compute a given partition in a TaskContext.
|
Internally, compute
makes sure that the input split
is a ShuffledRowRDDPartition. It then requests ShuffleManager
for a ShuffleReader
to read InternalRow
s for the split
.
Note
|
compute uses SparkEnv to access the current ShuffleManager .
|
Note
|
compute uses ShuffleHandle (of ShuffleDependency dependency) and the pre-shuffle start and end partition offsets.
|
Getting Placement Preferences of Partition — getPreferredLocations
Method
getPreferredLocations(partition: Partition): Seq[String]
Note
|
getPreferredLocations is a part of RDD contract to specify placement preferences (aka preferred task locations), i.e. where tasks should be executed to be as close to the data as possible.
|
Internally, getPreferredLocations
requests MapOutputTrackerMaster
for the preferred locations of the input partition
(for the single ShuffleDependency).
Note
|
getPreferredLocations uses SparkEnv to access the current MapOutputTrackerMaster (which runs on the driver).
|
CoalescedPartitioner
Caution
|
FIXME |
ShuffledRowRDDPartition
Caution
|
FIXME |