MapStatus — Shuffle Map Output Status

There are two types of MapStatus:

  • CompressedMapStatus that compresses the estimated map output size to 8 bits (Byte) for efficient reporting.

  • HighlyCompressedMapStatus that stores the average size of non-empty blocks, and a compressed bitmap for tracking which blocks are empty.

When the number of blocks (the size of uncompressedSizes) is greater than 2000, HighlyCompressedMapStatus is chosen.

Caution
FIXME What exactly is 2000? Is this the number of tasks in a job?

MapStatus Contract

trait MapStatus {
  def location: BlockManagerId
  def getSizeForBlock(reduceId: Int): Long
}
Note
MapStatus is a private[spark] contract.
Table 1. MapStatus Contract
Method Description

location

The BlockManager where a ShuffleMapTask ran and the result is stored.

getSizeForBlock

The estimated size for the reduce block (in bytes).

results matching ""

    No results matching ""