BlockDataManager — Block Storage Management API

BlockDataManager is a pluggable interface to manage storage for blocks of data (aka block storage management API). Blocks are identified by BlockId that has a globally unique identifier (name) and stored as ManagedBuffer.

Table 1. Types of BlockIds
Name Description

RDDBlockId

Described by rddId and splitIndex

Created when a RDD is requested to getOrCompute a partition (identified by splitIndex).

ShuffleBlockId

Described by shuffleId, mapId and reduceId

ShuffleDataBlockId

Described by shuffleId, mapId and reduceId

ShuffleIndexBlockId

Described by shuffleId, mapId and reduceId

BroadcastBlockId

Described by broadcastId identifier and optional field

TaskResultBlockId

Described by taskId

StreamBlockId

Described by streamId and uniqueId

Note
BlockManager is currently the only available implementation of BlockDataManager.
Note
org.apache.spark.network.BlockDataManager is a private[spark] Scala trait in Spark.

BlockDataManager Contract

Every BlockDataManager offers the following services:

  • getBlockData to fetch a local block data by blockId.

    getBlockData(blockId: BlockId): ManagedBuffer
  • putBlockData to upload a block data locally by blockId. The return value says whether the operation has succeeded (true) or failed (false).

    putBlockData(
      blockId: BlockId,
      data: ManagedBuffer,
      level: StorageLevel,
      classTag: ClassTag[_]): Boolean
  • releaseLock is a release lock for getBlockData and putBlockData operations.

    releaseLock(blockId: BlockId): Unit

ManagedBuffer

results matching ""

    No results matching ""