Spark local (pseudo-cluster)
You can run Spark in local mode. In this non-distributed single-JVM deployment mode, Spark spawns all the execution components - driver, executor, LocalSchedulerBackend, and master - in the same single JVM. The default parallelism is the number of threads as specified in the master URL. This is the only mode where a driver is used for execution.
The local mode is very convenient for testing, debugging or demonstration purposes as it requires no earlier setup to launch Spark applications.
This mode of operation is also called Spark in-process or (less commonly) a local version of Spark.
SparkContext.isLocal
returns true
when Spark runs in local mode.
scala> sc.isLocal
res0: Boolean = true
Spark shell defaults to local mode with local[*]
as the the master URL.
scala> sc.master
res0: String = local[*]
Tasks are not re-executed on failure in local mode (unless local-with-retries master URL is used).
The task scheduler in local mode works with LocalSchedulerBackend task scheduler backend.
Master URL
You can run Spark in local mode using local
, local[n]
or the most general local[*]
for the master URL.
The URL says how many threads can be used in total:
-
local
uses 1 thread only. -
local[n]
usesn
threads. -
local[*]
uses as many threads as the number of processors available to the Java virtual machine (it uses Runtime.getRuntime.availableProcessors() to know the number).
Caution
|
FIXME What happens when there’s less cores than n in the master URL? It is a question from twitter.
|
-
local[N, maxFailures]
(called local-with-retries) withN
being*
or the number of threads to use (as explained above) andmaxFailures
being the value of spark.task.maxFailures.
Task Submission a.k.a. reviveOffers
When ReviveOffers
or StatusUpdate
messages are received, LocalEndpoint places an offer to TaskSchedulerImpl
(using TaskSchedulerImpl.resourceOffers
).
If there is one or more tasks that match the offer, they are launched (using executor.launchTask
method).
The number of tasks to be launched is controlled by the number of threads as specified in master URL. The executor uses threads to spawn the tasks.