log4j.logger.org.apache.spark.storage.DiskBlockManager=DEBUG
DiskBlockManager
DiskBlockManager
creates and maintains the logical mapping between logical blocks and physical on-disk locations.
By default, one block is mapped to one file with a name given by its BlockId. It is however possible to have a block map to only a segment of a file.
Block files are hashed among the local directories.
Note
|
DiskBlockManager is used exclusively by DiskStore and created when BlockManager is created (and passed to DiskStore ).
|
Tip
|
Enable Add the following line to Refer to Logging. |
Finding File — getFile
Method
Caution
|
FIXME |
createTempShuffleBlock
Method
createTempShuffleBlock(): (TempShuffleBlockId, File)
createTempShuffleBlock
creates a temporary TempShuffleBlockId
block.
Caution
|
FIXME |
Collection of Locks for Local Directories — subDirs
Internal Property
subDirs: Array[Array[File]]
subDirs
is a collection of locks for every local directory where DiskBlockManager
stores block data (with the columns being the number of local directories and the rows as collection of subDirsPerLocalDir
size).
Note
|
subDirs(n) is to access n -th local directory.
|
getAllFiles
Method
Caution
|
FIXME |
Creating DiskBlockManager
Instance
DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolean)
When created, DiskBlockManager
uses spark.diskStore.subDirectories to set subDirsPerLocalDir
.
DiskBlockManager
creates one or many local directories to store block data (as localDirs
). When not successful, you should see the following ERROR message in the logs and DiskBlockManager
exits with error code 53
.
ERROR DiskBlockManager: Failed to create any local dir.
DiskBlockManager
initializes the internal subDirs collection of locks for every local directory to store block data with an array of subDirsPerLocalDir
size for files.
In the end, DiskBlockManager
registers a shutdown hook to clean up the local directories for blocks.
Registering Shutdown Hook — addShutdownHook
Internal Method
addShutdownHook(): AnyRef
addShutdownHook
registers a shutdown hook to execute doStop at shutdown.
When executed, you should see the following DEBUG message in the logs:
DEBUG DiskBlockManager: Adding shutdown hook
addShutdownHook
adds the shutdown hook so it prints the following INFO message and executes doStop.
INFO DiskBlockManager: Shutdown hook called
Removing Local Directories for Blocks — doStop
Internal Method
doStop(): Unit
doStop
deletes the local directories recursively (only when the constructor’s deleteFilesOnStop
is enabled and the parent directories are not registered to be removed at shutdown).
Creating Directories for Blocks — createLocalDirs
Internal Method
createLocalDirs(conf: SparkConf): Array[File]
createLocalDirs
creates blockmgr-[random UUID]
directory under local directories to store block data.
Internally, createLocalDirs
reads local writable directories and creates a subdirectory blockmgr-[random UUID]
under every configured parent directory.
If successful, you should see the following INFO message in the logs:
INFO DiskBlockManager: Created local directory at [localDir]
When failed to create a local directory, you should see the following ERROR message in the logs:
ERROR DiskBlockManager: Failed to create local dir in [rootDir]. Ignoring this directory.
Getting Local Directories for Spark to Write Files — Utils.getConfiguredLocalDirs
Internal Method
getConfiguredLocalDirs(conf: SparkConf): Array[String]
getConfiguredLocalDirs
returns the local directories where Spark can write files.
Internally, getConfiguredLocalDirs
uses conf
SparkConf to know if External Shuffle Service is enabled (using spark.shuffle.service.enabled).
getConfiguredLocalDirs
checks if Spark runs on YARN and if so, returns LOCAL_DIRS
-controlled local directories.
In non-YARN mode (or for the driver in yarn-client mode), getConfiguredLocalDirs
checks the following environment variables (in the order) and returns the value of the first met:
-
SPARK_EXECUTOR_DIRS
environment variable -
SPARK_LOCAL_DIRS
environment variable -
MESOS_DIRECTORY
environment variable (only when External Shuffle Service is not used)
In the end, when no earlier environment variables were found, getConfiguredLocalDirs
uses spark.local.dir
Spark property or eventually java.io.tmpdir
System property.
Getting Writable Directories in YARN — getYarnLocalDirs
Internal Method
getYarnLocalDirs(conf: SparkConf): String
getYarnLocalDirs
uses conf
SparkConf to read LOCAL_DIRS
environment variable with comma-separated local directories (that have already been created and secured so that only the user has access to them).
getYarnLocalDirs
throws an Exception
with the message Yarn Local dirs can’t be empty
if LOCAL_DIRS
environment variable was not set.
Checking If Spark Runs on YARN — isRunningInYarnContainer
Internal Method
isRunningInYarnContainer(conf: SparkConf): Boolean
isRunningInYarnContainer
uses conf
SparkConf to read Hadoop YARN’s CONTAINER_ID
environment variable to find out if Spark runs in a YARN container.
Note
|
CONTAINER_ID environment variable is exported by YARN NodeManager.
|
getAllBlocks
Method
getAllBlocks(): Seq[BlockId]
getAllBlocks
lists all the blocks currently stored on disk.
Internally, getAllBlocks
takes the block files and returns their names (as BlockId
).
Note
|
getAllBlocks is used when BlockManager computes the ids of existing blocks (for a given filter).
|
Settings
Spark Property | Default Value | Description |
---|---|---|
|
The number of …FIXME |