log4j.logger.org.apache.spark.util.collection.ExternalSorter=INFO
ExternalSorter
ExternalSorter is a Spillable of WritablePartitionedPairCollection of K-key / C-value pairs.
When created ExternalSorter expects three different types of data defined, i.e. K, V, C, for keys, values, and combiner (partial) values, respectively.
|
Note
|
ExternalSorter is exclusively used when SortShuffleWriter writes records and BlockStoreShuffleReader reads combined key-value pairs (for reduce task when ShuffleDependency has key ordering defined (to sort output).
|
|
Tip
|
Enable Add the following line to Refer to Logging. |
stop Method
|
Caution
|
FIXME |
writePartitionedFile Method
|
Caution
|
FIXME |
Creating ExternalSorter Instance
ExternalSorter takes the following:
-
TaskContext
-
Optional Aggregator
-
Optional Partitioner
-
Optional Scala’s Ordering
-
Optional Serializer
|
Note
|
ExternalSorter uses SparkEnv to access the default Serializer.
|
|
Note
|
ExternalSorter is created when SortShuffleWriter writes records and BlockStoreShuffleReader reads combined key-value pairs (for reduce task when ShuffleDependency has key ordering defined (to sort output).
|
spillMemoryIteratorToDisk Internal Method
spillMemoryIteratorToDisk(inMemoryIterator: WritablePartitionedIterator): SpilledFile
|
Caution
|
FIXME |
spill Method
spill(collection: WritablePartitionedPairCollection[K, C]): Unit
|
Note
|
spill is a part of Spillable contract.
|
|
Caution
|
FIXME |
insertAll Method
insertAll(records: Iterator[Product2[K, V]]): Unit
|
Caution
|
FIXME |
|
Note
|
insertAll is used when SortShuffleWriter writes records and BlockStoreShuffleReader reads combined key-value pairs (for reduce task when ShuffleDependency has key ordering defined (to sort output).
|
Settings
| Spark Property | Default Value | Description |
|---|---|---|
|
Size of the in-memory buffer for each shuffle file output stream. In bytes unless the unit is specified. These buffers reduce the number of disk seeks and system calls made in creating intermediate shuffle files. Used in NOTE: |
|
|
Size of object batches when reading/writing from serializers. |