SchedulerBackend — Pluggable Scheduler Backends

SchedulerBackend is a pluggable interface to support various cluster managers, e.g. Apache Mesos, Hadoop YARN or Spark’s own Spark Standalone and Spark local.

As the cluster managers differ by their custom task scheduling modes and resource offers mechanisms Spark abstracts the differences in SchedulerBackend contract.

Table 1. Built-In (Direct and Indirect) SchedulerBackends per Cluster Environment
Cluster Environment	SchedulerBackends
Local mode	LocalSchedulerBackend
(base for custom SchedulerBackends)	CoarseGrainedSchedulerBackend
Spark Standalone	StandaloneSchedulerBackend
Spark on YARN	YarnSchedulerBackend: YarnClientSchedulerBackend (for client deploy mode) YarnClusterSchedulerBackend (for cluster deploy mode)
Spark on Mesos	MesosCoarseGrainedSchedulerBackend `MesosFineGrainedSchedulerBackend`

A scheduler backend is created and started as part of SparkContext’s initialization (when TaskSchedulerImpl is started - see Creating Scheduler Backend and Task Scheduler).

Caution

FIXME Image how it gets created with SparkContext in play here or in SparkContext doc.

Scheduler backends are started and stopped as part of TaskSchedulerImpl’s initialization and stopping.

Being a scheduler backend in Spark assumes a Apache Mesos-like model in which "an application" gets resource offers as machines become available and can launch tasks on them. Once a scheduler backend obtains the resource allocation, it can start executors.

Tip	Understanding how Apache Mesos works can greatly improve understanding Spark.

SchedulerBackend Contract

trait SchedulerBackend {
  def applicationId(): String
  def applicationAttemptId(): Option[String]
  def defaultParallelism(): Int
  def getDriverLogUrls: Option[Map[String, String]]
  def isReady(): Boolean
  def killTask(taskId: Long, executorId: String, interruptThread: Boolean): Unit
  def reviveOffers(): Unit
  def start(): Unit
  def stop(): Unit
}

Note	`org.apache.spark.scheduler.SchedulerBackend` is a `private[spark]` Scala trait in Spark.

Table 2. SchedulerBackend Contract
Method	Description
`applicationId`	Unique identifier of Spark Application Used when `TaskSchedulerImpl` is asked for the unique identifier of a Spark application (that is actually a part of TaskScheduler contract).
`applicationAttemptId`	Attempt id of a Spark application Only supported by YARN cluster scheduler backend as the YARN cluster manager supports multiple application attempts. Used when… NOTE: `applicationAttemptId` is also a part of TaskScheduler contract and `TaskSchedulerImpl` directly calls the SchedulerBackend’s `applicationAttemptId`.
`defaultParallelism`	Used when `TaskSchedulerImpl` finds the default level of parallelism (as a hint for sizing jobs).
`getDriverLogUrls`	Returns no URLs by default and only supported by YarnClusterSchedulerBackend
`isReady`	Controls whether `SchedulerBackend` is ready (i.e. `true`) or not (i.e. `false`). Enabled by default. Used when `TaskSchedulerImpl` waits until `SchedulerBackend` is ready (which happens just before `SparkContext` is fully initialized).
`killTask`	Reports a `UnsupportedOperationException` by default. Used when: `TaskSchedulerImpl` cancels the tasks for a stage `TaskSetManager` is notified about successful task attempt.
`reviveOffers`	Used when `TaskSchedulerImpl`: Submits tasks (from `TaskSet`) Receives task status updates Notifies `TaskSetManager` that a task has failed Checks for speculatable tasks Gets notified about executor being lost
`start`	Starts `SchedulerBackend`. Used when `TaskSchedulerImpl` is started.
`stop`	Stops `SchedulerBackend`. Used when `TaskSchedulerImpl` is stopped.

SchedulerBackend — Pluggable Scheduler Backends

SchedulerBackend — Pluggable Scheduler Backends

SchedulerBackend Contract

results matching ""

No results matching ""