log4j.logger.org.apache.spark.util.collection.ExternalSorter=INFO
ExternalSorter
ExternalSorter
is a Spillable
of WritablePartitionedPairCollection
of K
-key / C
-value pairs.
When created ExternalSorter
expects three different types of data defined, i.e. K
, V
, C
, for keys, values, and combiner (partial) values, respectively.
Note
|
ExternalSorter is exclusively used when SortShuffleWriter writes records and BlockStoreShuffleReader reads combined key-value pairs (for reduce task when ShuffleDependency has key ordering defined (to sort output).
|
Tip
|
Enable Add the following line to Refer to Logging. |
stop
Method
Caution
|
FIXME |
writePartitionedFile
Method
Caution
|
FIXME |
Creating ExternalSorter Instance
ExternalSorter
takes the following:
-
TaskContext
-
Optional Aggregator
-
Optional Partitioner
-
Optional Scala’s Ordering
-
Optional Serializer
Note
|
ExternalSorter uses SparkEnv to access the default Serializer .
|
Note
|
ExternalSorter is created when SortShuffleWriter writes records and BlockStoreShuffleReader reads combined key-value pairs (for reduce task when ShuffleDependency has key ordering defined (to sort output).
|
spillMemoryIteratorToDisk
Internal Method
spillMemoryIteratorToDisk(inMemoryIterator: WritablePartitionedIterator): SpilledFile
Caution
|
FIXME |
spill
Method
spill(collection: WritablePartitionedPairCollection[K, C]): Unit
Note
|
spill is a part of Spillable contract.
|
Caution
|
FIXME |
insertAll
Method
insertAll(records: Iterator[Product2[K, V]]): Unit
Caution
|
FIXME |
Note
|
insertAll is used when SortShuffleWriter writes records and BlockStoreShuffleReader reads combined key-value pairs (for reduce task when ShuffleDependency has key ordering defined (to sort output).
|
Settings
Spark Property | Default Value | Description |
---|---|---|
|
Size of the in-memory buffer for each shuffle file output stream. In bytes unless the unit is specified. These buffers reduce the number of disk seeks and system calls made in creating intermediate shuffle files. Used in NOTE: |
|
|
Size of object batches when reading/writing from serializers. |