SchedulerBackend — Pluggable Scheduler Backends

SchedulerBackend is a pluggable interface to support various cluster managers, e.g. Apache Mesos, Hadoop YARN or Spark’s own Spark Standalone and Spark local.

As the cluster managers differ by their custom task scheduling modes and resource offers mechanisms Spark abstracts the differences in SchedulerBackend contract.

Table 1. Built-In (Direct and Indirect) SchedulerBackends per Cluster Environment
Cluster Environment SchedulerBackends

Local mode

LocalSchedulerBackend

(base for custom SchedulerBackends)

CoarseGrainedSchedulerBackend

Spark Standalone

StandaloneSchedulerBackend

Spark on YARN

Spark on Mesos

A scheduler backend is created and started as part of SparkContext’s initialization (when TaskSchedulerImpl is started - see Creating Scheduler Backend and Task Scheduler).

Caution
FIXME Image how it gets created with SparkContext in play here or in SparkContext doc.

Scheduler backends are started and stopped as part of TaskSchedulerImpl’s initialization and stopping.

Being a scheduler backend in Spark assumes a Apache Mesos-like model in which "an application" gets resource offers as machines become available and can launch tasks on them. Once a scheduler backend obtains the resource allocation, it can start executors.

Tip
Understanding how Apache Mesos works can greatly improve understanding Spark.

SchedulerBackend Contract

trait SchedulerBackend {
  def applicationId(): String
  def applicationAttemptId(): Option[String]
  def defaultParallelism(): Int
  def getDriverLogUrls: Option[Map[String, String]]
  def isReady(): Boolean
  def killTask(taskId: Long, executorId: String, interruptThread: Boolean): Unit
  def reviveOffers(): Unit
  def start(): Unit
  def stop(): Unit
}
Note
org.apache.spark.scheduler.SchedulerBackend is a private[spark] Scala trait in Spark.
Table 2. SchedulerBackend Contract
Method Description

applicationId

Unique identifier of Spark Application

Used when TaskSchedulerImpl is asked for the unique identifier of a Spark application (that is actually a part of TaskScheduler contract).

applicationAttemptId

Attempt id of a Spark application

Only supported by YARN cluster scheduler backend as the YARN cluster manager supports multiple application attempts.

Used when…​

NOTE: applicationAttemptId is also a part of TaskScheduler contract and TaskSchedulerImpl directly calls the SchedulerBackend’s applicationAttemptId.

defaultParallelism

Used when TaskSchedulerImpl finds the default level of parallelism (as a hint for sizing jobs).

getDriverLogUrls

Returns no URLs by default and only supported by YarnClusterSchedulerBackend

isReady

Controls whether SchedulerBackend is ready (i.e. true) or not (i.e. false). Enabled by default.

Used when TaskSchedulerImpl waits until SchedulerBackend is ready (which happens just before SparkContext is fully initialized).

killTask

Reports a UnsupportedOperationException by default.

Used when:

reviveOffers

start

Starts SchedulerBackend.

Used when TaskSchedulerImpl is started.

stop

Stops SchedulerBackend.

Used when TaskSchedulerImpl is stopped.

results matching ""

    No results matching ""