ExecutorRunnable
ExecutorRunnable
starts a YARN container with CoarseGrainedExecutorBackend standalone application.
ExecutorRunnable
is created when YarnAllocator
launches Spark executors in allocated YARN containers (and for debugging purposes when ApplicationMaster
requests cluster resources for executors).
If external shuffle service is used, it is set in the ContainerLaunchContext
context as a service under the name of spark_shuffle
.
Name | Description |
---|---|
Note
|
Despite the name ExecutorRunnable is not a java.lang.Runnable anymore after SPARK-12447.
|
Tip
|
Enable Add the following line to
Refer to Logging. |
Creating ExecutorRunnable Instance
ExecutorRunnable
takes the following when created:
-
YARN Container to run a Spark executor in
-
sparkConf
— SparkConf -
masterAddress
-
executorId
-
hostname
of the YARN container -
appId
ExecutorRunnable
initializes the internal registries and counters.
Note
|
executorMemory and executorCores input arguments are from YarnAllocator but really are spark.executor.memory and spark.executor.cores properties. |
Note
|
Most of the input parameters are exactly as YarnAllocator was created with.
|
Building Command to Run CoarseGrainedExecutorBackend in YARN Container — prepareCommand
Internal Method
prepareCommand(
masterAddress: String,
slaveId: String,
hostname: String,
executorMemory: Int,
executorCores: Int,
appId: String): List[String]
prepareCommand
prepares the command that is used to start org.apache.spark.executor.CoarseGrainedExecutorBackend
application in a YARN container. All the input parameters of prepareCommand
become the command-line arguments of CoarseGrainedExecutorBackend
application.
prepareCommand
builds the command that will be executed in a YARN container.
Note
|
JVM options are defined using -Dkey=value format.
|
prepareCommand
builds -Xmx
JVM option using executorMemory (in MB).
Note
|
prepareCommand uses executorMemory that is given when ExecutorRunnable is created.
|
prepareCommand
adds the optional spark.executor.extraJavaOptions property to the JVM options (if defined).
prepareCommand
adds the optional SPARK_JAVA_OPTS
environment variable to the JVM options (if defined).
prepareCommand
adds the optional spark.executor.extraLibraryPath to the library path (changing the path to be YARN NodeManager-aware).
prepareCommand
adds -Djava.io.tmpdir=<LOG_DIR>./tmp
to the JVM options.
prepareCommand
adds all the Spark properties for executors to the JVM options.
Note
|
prepareCommand uses SparkConf that is given when ExecutorRunnable is created.
|
prepareCommand
adds -Dspark.yarn.app.container.log.dir=<LOG_DIR>
to the JVM options.
prepareCommand
adds -XX:MaxPermSize=256m
unless already defined or IBM JVM or Java 8 are used.
prepareCommand
reads the list of URIs representing the user classpath and adds --user-class-path
and file:[path]
for every entry.
prepareCommand
adds -XX:OnOutOfMemoryError
to the JVM options unless already defined.
In the end, prepareCommand
combines the parts together to build the entire command with the following (in order):
-
Extra library path
-
JAVA_HOME/bin/java
-
-server
-
JVM options
-
org.apache.spark.executor.CoarseGrainedExecutorBackend
-
--driver-url
followed bymasterAddress
-
--executor-id
followed byexecutorId
-
--hostname
followed byhostname
-
--cores
followed byexecutorCores
-
--app-id
followed byappId
-
--user-class-path
with the arguments -
1><LOG_DIR>/stdout
-
2><LOG_DIR>/stderr
Note
|
prepareCommand uses the arguments for --driver-url , --executor-id , --hostname , --cores and --app-id as given when ExecutorRunnable is created.
|
Note
|
You can see the result of prepareCommand as command in the INFO message in the logs when ApplicationMaster registers itself with YARN ResourceManager (to print it out once and avoid flooding the logs when starting Spark executors).
|
Note
|
prepareCommand is used when ExecutorRunnable starts CoarseGrainedExecutorBackend in a YARN resource container and (only for debugging purposes) when ExecutorRunnable builds launch context diagnostic information (to print it out as an INFO message to the logs).
|
Collecting Environment Variables for CoarseGrainedExecutorBackend Containers — prepareEnvironment
Internal Method
prepareEnvironment(): HashMap[String, String]
prepareEnvironment
collects environment-related entries.
prepareEnvironment
populates class path (passing in YarnConfiguration, SparkConf, and spark.executor.extraClassPath property)
Caution
|
FIXME How does populateClasspath use the input env ?
|
prepareEnvironment
collects the executor environment variables set on the current SparkConf, i.e. the Spark properties with the prefix spark.executorEnv.
, and YarnSparkHadoopUtil.addPathToEnvironment(env, key, value).
Note
|
SPARK_YARN_USER_ENV is deprecated.
|
prepareEnvironment
reads YARN’s yarn.http.policy property (with YarnConfiguration.YARN_HTTP_POLICY_DEFAULT) to choose a secure HTTPS scheme for container logs when HTTPS_ONLY
.
With the input container
defined and SPARK_USER
environment variable available, prepareEnvironment
registers SPARK_LOG_URL_STDERR
and SPARK_LOG_URL_STDOUT
environment entries with stderr?start=-4096
and stdout?start=-4096
added to [httpScheme][address]/node/containerlogs/[containerId]/[user]
, respectively.
In the end, prepareEnvironment
collects all the System environment variables with SPARK
prefix.
Note
|
prepareEnvironment is used when ExecutorRunnable starts CoarseGrainedExecutorBackend in a container and (for debugging purposes) builds launch context diagnostic information (to print it out as an INFO message to the logs).
|
Starting ExecutorRunnable (with CoarseGrainedExecutorBackend) — run
Method
run(): Unit
When called, you should see the following DEBUG message in the logs:
DEBUG ExecutorRunnable: Starting Executor Container
run
creates a YARN NMClient (to communicate with YARN NodeManager service), inits it with YarnConfiguration and starts it.
Note
|
run uses YarnConfiguration that was given when ExecutorRunnable was created.
|
In the end, run
starts CoarseGrainedExecutorBackend
in the YARN container.
Note
|
run is used exclusively when YarnAllocator schedules ExecutorRunnables in allocated YARN resource containers.
|
Starting YARN Resource Container — startContainer
Method
startContainer(): java.util.Map[String, ByteBuffer]
startContainer
uses YARN NodeManager’s NMClient API to start a CoarseGrainedExecutorBackend in a YARN container.
Tip
|
|
startContainer
creates a YARN ContainerLaunchContext.
Note
|
YARN ContainerLaunchContext represents all of the information for the YARN NodeManager to launch a resource container. |
startContainer
then sets local resources and environment to the ContainerLaunchContext
.
Note
|
startContainer uses local resources given when ExecutorRunnable was created.
|
startContainer
sets security tokens to the ContainerLaunchContext
(using Hadoop’s UserGroupInformation
and the current user’s credentials).
startContainer
sets the command (to launch CoarseGrainedExecutorBackend
) to the ContainerLaunchContext
.
startContainer
sets the application ACLs to the ContainerLaunchContext
.
If spark.shuffle.service.enabled property is enabled, startContainer
registers the ContainerLaunchContext
with the YARN shuffle service started on the YARN NodeManager under spark_shuffle
service name.
In the end, startContainer
requests the YARN NodeManager to start the YARN container with the ContainerLaunchContext
context.
Note
|
startContainer uses nmClient internal reference to send the request with the YARN resource container given when ExecutorRunnable was created.
|
If any exception happens, startContainer
reports SparkException
.
Exception while starting container [containerId] on host [hostname]
Note
|
startContainer is used exclusively when ExecutorRunnable is started.
|
Building Launch Context Diagnostic Information (with Command, Environment and Resources) — launchContextDebugInfo
Method
launchContextDebugInfo(): String
launchContextDebugInfo
prepares the command to launch CoarseGrainedExecutorBackend
(as commands
value) and collects environment variables for CoarseGrainedExecutorBackend
containers (as env
value).
launchContextDebugInfo
returns the launch context debug info.
===============================================================================
YARN executor launch context:
env:
[key] -> [value]
...
command:
[commands]
resources:
[key] -> [value]
===============================================================================
Note
|
resources entry is the input localResources given when ExecutorRunnable was created.
|
Note
|
launchContextDebugInfo is used when ApplicationMaster registers itself with YARN ResourceManager.
|