YarnSchedulerBackend — Foundation for Coarse-Grained Scheduler Backends for YARN
YarnSchedulerBackend
is a CoarseGrainedSchedulerBackend that acts as the foundation for the concrete deploy mode-specific Spark scheduler backends for YARN, i.e. YarnClientSchedulerBackend and YarnClusterSchedulerBackend for client
deploy mode and cluster
deploy mode, respectively.
YarnSchedulerBackend
registers itself as YarnScheduler
RPC endpoint in the RPC Environment.
YarnSchedulerBackend
is ready to accept task launch requests right after the sufficient executors are registered (that varies on dynamic allocation being enabled or not).
Note
|
With no extra configuration, YarnSchedulerBackend is ready for task launch requests when 80% of all the requested executors are available.
|
Note
|
YarnSchedulerBackend is an private[spark] abstract class and is never created directly (but only indirectly through the concrete implementations YarnClientSchedulerBackend and YarnClusterSchedulerBackend).
|
Name | Initial Value | Description |
---|---|---|
Ratio for minimum number of registered executors to claim
|
Minimum expected number of executors that is used to ensure that sufficient resources are available (and start accepting task launch requests). |
|
YarnSchedulerEndpoint object |
||
|
Total expected number of executors that is used to ensure that sufficient resources are available (and start accepting task launch requests). Updated to the final value when Spark on YARN starts (in client mode or cluster mode). |
|
(undefined) |
YARN’s ApplicationAttemptId of a Spark application. Only defined in Set when YarnClusterSchedulerBackend starts (and bindToYarn is called) using YARN’s Used for applicationAttemptId which is a part of SchedulerBackend Contract. |
|
Controls whether to reset Disabled (i.e. |
Resetting YarnSchedulerBackend — reset
Method
Note
|
reset is a part of CoarseGrainedSchedulerBackend Contract.
|
reset
resets the parent CoarseGrainedSchedulerBackend
scheduler backend and ExecutorAllocationManager (accessible by SparkContext.executorAllocationManager
).
doRequestTotalExecutors
Method
def doRequestTotalExecutors(requestedTotal: Int): Boolean
Note
|
doRequestTotalExecutors is a part of the CoarseGrainedSchedulerBackend Contract.
|
doRequestTotalExecutors
simply sends a blocking RequestExecutors message to YarnScheduler RPC Endpoint with the input requestedTotal
and the internal localityAwareTasks
and hostToLocalTaskCount
attributes.
Caution
|
FIXME The internal attributes are already set. When and how? |
Starting the Backend — start
Method
start
creates a SchedulerExtensionServiceBinding
object (using SparkContext
, appId
, and attemptId
) and starts it (using SchedulerExtensionServices.start(binding)
).
Note
|
A SchedulerExtensionServices object is created when YarnSchedulerBackend is initialized and available as services .
|
Ultimately, it calls the parent’s CoarseGrainedSchedulerBackend.start.
Note
|
|
Stopping the Backend — stop
Method
stop
calls the parent’s CoarseGrainedSchedulerBackend.requestTotalExecutors (using (0, 0, Map.empty)
parameters).
Caution
|
FIXME Explain what 0, 0, Map.empty means after the method’s described for the parent.
|
It calls the parent’s CoarseGrainedSchedulerBackend.stop.
Ultimately, it stops the internal SchedulerExtensionServiceBinding
object (using services.stop()
).
Caution
|
FIXME Link the description of services.stop() here.
|
Recording Application and Attempt Ids — bindToYarn
Method
bindToYarn(appId: ApplicationId, attemptId: Option[ApplicationAttemptId]): Unit
bindToYarn
sets the internal appId
and attemptId
to the value of the input parameters, appId
and attemptId
, respectively.
Note
|
start requires appId .
|
Requesting YARN for Spark Application’s Current Attempt Id — applicationAttemptId
Method
applicationAttemptId(): Option[String]
Note
|
applicationAttemptId is a part of SchedulerBackend Contract.
|
applicationAttemptId
requests the internal YARN’s ApplicationAttemptId for the Spark application’s current attempt id.
Creating YarnSchedulerBackend Instance
Note
|
This section is only to take notes about the required components to instantiate the base services. |
YarnSchedulerBackend
takes the following when created:
YarnSchedulerBackend
initializes the internal properties.
Checking if Enough Executors Are Available — sufficientResourcesRegistered
Method
sufficientResourcesRegistered(): Boolean
Note
|
sufficientResourcesRegistered is a part of the CoarseGrainedSchedulerBackend contract that makes sure that sufficient resources are available.
|
sufficientResourcesRegistered
is positive, i.e. true
, when totalRegisteredExecutors is exactly or above minRegisteredRatio of totalExpectedExecutors.