activeStorageStatusList: Seq[StorageStatus]
StorageListener — Spark Listener for Tracking Persistence Status of RDD Blocks
StorageListener
is a BlockStatusListener that uses SparkListener callbacks to track changes in the persistence status of RDD blocks in a Spark application.
Callback | Description |
---|---|
Updates _rddInfoMap with the update to a single block. |
|
Removes |
|
Updates _rddInfoMap registry with the names of every |
|
Removes the |
Name | Description |
---|---|
Lookup table of Used when…FIXME |
Creating StorageListener Instance
StorageListener
takes the following when created:
StorageListener
initializes the internal registries and counters.
Note
|
StorageListener is created when SparkUI is created.
|
Finding Active BlockManagers — activeStorageStatusList
Method
activeStorageStatusList
requests StorageStatusListener for active BlockManagers (on executors).
Note
|
|
Intercepting Block Status Update Events — onBlockUpdated
Callback
onBlockUpdated(blockUpdated: SparkListenerBlockUpdated): Unit
onBlockUpdated
creates a BlockStatus
(from the input SparkListenerBlockUpdated
) and updates registered RDDInfos (with block updates from BlockManagers) (passing in BlockId and BlockStatus
as a single-element collection of updated blocks).
Note
|
onBlockUpdated is a part of SparkListener contract to announce that there was a change in a block status (on a BlockManager on an executor).
|
Intercepting Stage Completed Events — onStageCompleted
Callback
onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit
onStageCompleted
finds the identifiers of the RDDs that have participated in the completed stage and removes them from _rddInfoMap registry as well as the RDDs that are no longer cached.
Note
|
onStageCompleted is a part of SparkListener contract to announce that a stage has finished.
|
Intercepting Stage Submitted Events — onStageSubmitted
Callback
onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted): Unit
onStageSubmitted
updates _rddInfoMap registry with the names of every RDDInfo
in stageSubmitted
, possibly adding new RDDInfos
if they were not registered yet.
Note
|
onStageSubmitted is a part of SparkListener contract to announce that the missing tasks of a stage were submitted for execution.
|
Intercepting Unpersist RDD Events — onUnpersistRDD
Callback
onUnpersistRDD(unpersistRDD: SparkListenerUnpersistRDD): Unit
onUnpersistRDD
removes the RDDInfo
from _rddInfoMap registry for the unpersisted RDD (from unpersistRDD
).
Note
|
onUnpersistRDD is a part of SparkListener contract to announce that an RDD has been unpersisted.
|
Updating Registered RDDInfos (with Block Updates from BlockManagers) — updateRDDInfo
Internal Method
updateRDDInfo(updatedBlocks: Seq[(BlockId, BlockStatus)]): Unit
updateRDDInfo
finds the RDDs for the input updatedBlocks
(for BlockIds).
Note
|
updateRDDInfo finds BlockIds that are RDDBlockIds.
|
updateRDDInfo
takes RDDInfo
entries (in _rddInfoMap registry) for which there are blocks in the input updatedBlocks
and updates RDDInfos (using StorageStatus) (from activeStorageStatusList).
Note
|
updateRDDInfo is used exclusively when StorageListener gets notified about a change in a block status (on a BlockManager on an executor).
|
Updating RDDInfos (using StorageStatus) — StorageUtils.updateRddInfo
Method
updateRddInfo(rddInfos: Seq[RDDInfo], statuses: Seq[StorageStatus]): Unit
Caution
|
FIXME |
Note
|
|