trait MapStatus {
def location: BlockManagerId
def getSizeForBlock(reduceId: Int): Long
}
MapStatus — Shuffle Map Output Status
MapStatus
is the result of running a ShuffleMapTask
that includes information about the BlockManager
and estimated size of the reduce blocks.
There are two types of MapStatus
:
-
CompressedMapStatus that compresses the estimated map output size to 8 bits (
Byte
) for efficient reporting. -
HighlyCompressedMapStatus that stores the average size of non-empty blocks, and a compressed bitmap for tracking which blocks are empty.
When the number of blocks (the size of uncompressedSizes
) is greater than 2000, HighlyCompressedMapStatus
is chosen.
Caution
|
FIXME What exactly is 2000? Is this the number of tasks in a job? |
MapStatus
Contract
Note
|
MapStatus is a private[spark] contract.
|
Method | Description |
---|---|
The BlockManager where a |
|
The estimated size for the reduce block (in bytes). |