log4j.logger.org.apache.spark.shuffle.sort.ShuffleExternalSorter=INFO
ShuffleExternalSorter — Cache-Efficient Sorter
ShuffleExternalSorter
is a specialized cache-efficient sorter that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, ShuffleExternalSorter
can fit more of the array into cache.
ShuffleExternalSorter
is a MemoryConsumer.
Name | Initial Value | Description |
---|---|---|
(empty) |
|
Tip
|
Enable Add the following line to Refer to Logging. |
getMemoryUsage
Method
Caution
|
FIXME |
closeAndGetSpills
Method
Caution
|
FIXME |
insertRecord
Method
Caution
|
FIXME |
freeMemory
Method
Caution
|
FIXME |
getPeakMemoryUsedBytes
Method
Caution
|
FIXME |
writeSortedFile
Method
Caution
|
FIXME |
cleanupResources
Method
Caution
|
FIXME |
Creating ShuffleExternalSorter Instance
ShuffleExternalSorter
takes the following when created:
-
memoryManager
— TaskMemoryManager -
blockManager
— BlockManager -
taskContext
— TaskContext -
initialSize
-
numPartitions
-
writeMetrics
— ShuffleWriteMetrics
ShuffleExternalSorter
initializes itself as a MemoryConsumer (with pageSize
as the minimum of PackedRecordPointer.MAXIMUM_PAGE_SIZE_BYTES
and pageSizeBytes, and Tungsten memory mode).
ShuffleExternalSorter
uses spark.shuffle.file.buffer (for fileBufferSizeBytes
) and spark.shuffle.spill.numElementsForceSpillThreshold
(for numElementsForSpillThreshold
) Spark properties.
ShuffleExternalSorter
creates a ShuffleInMemorySorter (with spark.shuffle.sort.useRadixSort
Spark property enabled by default).
ShuffleExternalSorter
initializes the internal registries and counters.
Note
|
ShuffleExternalSorter is created when UnsafeShuffleWriter is open (which is when UnsafeShuffleWriter is created).
|
Freeing Execution Memory by Spilling To Disk — spill
Method
long spill(long size, MemoryConsumer trigger)
throws IOException
Note
|
spill is a part of MemoryConsumer contract to sort and spill the current records due to memory pressure.
|
spill
frees execution memory, updates TaskMetrics
, and in the end returns the spill size.
Note
|
spill returns 0 when ShuffleExternalSorter has no ShuffleInMemorySorter or the ShuffleInMemorySorter manages no records.
|
You should see the following INFO message in the logs:
INFO Thread [id] spilling sort data of [memoryUsage] to disk ([size] times so far)
spill
writes sorted file (with isLastFile
disabled).
spill
frees memory and records the spill size.
spill
resets the internal ShuffleInMemorySorter
(that in turn frees up the underlying in-memory pointer array).
spill
returns the spill size.