Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

The following steps are a recipe for a Spark Standalone cluster with 2 workers on a single machine.

The aim is to have a complete Spark-clustered environment at your laptop.

Tip	Consult the following documents: Operating Spark master Starting Spark workers on node using sbin/start-slave.sh

Important

You can use the Spark Standalone cluster in the following ways:

Use spark-shell with --master MASTER_URL
Use SparkConf.setMaster(MASTER_URL) in your Spark application

For our learning purposes, MASTER_URL is spark://localhost:7077.

Start a standalone master server.
```
./sbin/start-master.sh
```
Notes:
- Read Operating Spark Standalone master
- Use SPARK_CONF_DIR for the configuration directory (defaults to $SPARK_HOME/conf).
- Use spark.deploy.retainedApplications (default: 200)
- Use spark.deploy.retainedDrivers (default: 200)
- Use spark.deploy.recoveryMode (default: NONE)
- Use spark.deploy.defaultCores (default: Int.MaxValue)
Open master’s web UI at http://localhost:8080 to know the current setup - no workers and applications.

Figure 1. Master’s web UI with no workers and applications
Start the first worker.
```
./sbin/start-slave.sh spark://japila.local:7077
```
Note
The command above in turn executes org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077
Check out master’s web UI at http://localhost:8080 to know the current setup - one worker.

Figure 2. Master’s web UI with one worker ALIVE

Note the number of CPUs and memory, 8 and 15 GBs, respectively (one gigabyte left for the OS — oh, how generous, my dear Spark!).
Let’s stop the worker to start over with custom configuration. You use ./sbin/stop-slave.sh to stop the worker.
```
./sbin/stop-slave.sh
```
Check out master’s web UI at http://localhost:8080 to know the current setup - one worker in DEAD state.

Figure 3. Master’s web UI with one worker DEAD
Start a worker using --cores 2 and --memory 4g for two CPU cores and 4 GB of RAM.
```
./sbin/start-slave.sh spark://japila.local:7077 --cores 2 --memory 4g
```
Note
The command translates to org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077 --cores 2 --memory 4g
Check out master’s web UI at http://localhost:8080 to know the current setup - one worker ALIVE and another DEAD.

Figure 4. Master’s web UI with one worker ALIVE and one DEAD
Configuring cluster using conf/spark-env.sh

There’s the conf/spark-env.sh.template template to start from.

We’re going to use the following conf/spark-env.sh:
conf/spark-env.sh
```
SPARK_WORKER_CORES=2 (1)
SPARK_WORKER_INSTANCES=2 (2)
SPARK_WORKER_MEMORY=2g
```
1. the number of cores per worker
2. the number of workers per node (a machine)

Start the workers.

./sbin/start-slave.sh spark://japila.local:7077

As the command progresses, it prints out starting org.apache.spark.deploy.worker.Worker, logging to for each worker. You defined two workers in conf/spark-env.sh using SPARK_WORKER_INSTANCES, so you should see two lines.

$ ./sbin/start-slave.sh spark://japila.local:7077
starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out
starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-2-japila.local.out

Check out master’s web UI at http://localhost:8080 to know the current setup - at least two workers should be ALIVE.

Figure 5. Master’s web UI with two workers ALIVE
Note

Use jps on master to see the instances given they all run on the same machine, e.g. localhost).

$ jps 6580 Worker 4872 Master 6874 Jps 6539 Worker
Stop all instances - the driver and the workers.
```
./sbin/stop-all.sh
```

Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

Example 2-workers-on-1-node Standalone Cluster (one executor per worker)

results matching ""

No results matching ""