Spark History Server
Spark History Server is the web UI for completed and running (aka incomplete) Spark applications. It is an extension of Spark’s web UI.
Tip
|
Enable collecting events in your Spark applications using spark.eventLog.enabled Spark property. |
You can start History Server by executing start-history-server.sh
shell script and stop it using stop-history-server.sh
.
start-history-server.sh
accepts --properties-file [propertiesFile]
command-line option that specifies the properties file with the custom Spark properties.
$ ./sbin/start-history-server.sh --properties-file history.properties
If not specified explicitly, Spark History Server uses the default configuration file, i.e. spark-defaults.conf.
Tip
|
Enable Add the following line to
Refer to Logging. |
Starting History Server — start-history-server.sh
script
You can start a HistoryServer
instance by executing $SPARK_HOME/sbin/start-history-server.sh
script (where SPARK_HOME
is the directory of your Spark installation).
$ ./sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to .../spark/logs/spark-jacek-org.apache.spark.deploy.history.HistoryServer-1-japila.out
Internally, start-history-server.sh
script starts org.apache.spark.deploy.history.HistoryServer standalone application for execution (using spark-daemon.sh
shell script).
$ ./bin/spark-class org.apache.spark.deploy.history.HistoryServer
Tip
|
Using the more explicit approach with spark-class to start Spark History Server could be easier to trace execution by seeing the logs printed out to the standard output and hence terminal directly.
|
When started, it prints out the following INFO message to the logs:
INFO HistoryServer: Started daemon with process name: [processName]
It registers signal handlers (using SignalUtils
) for TERM
, HUP
, INT
to log their execution:
ERROR HistoryServer: RECEIVED SIGNAL [signal]
It inits security if enabled (using spark.history.kerberos.enabled
setting).
Caution
|
FIXME Describe initSecurity
|
It creates a SecurityManager
.
It creates a ApplicationHistoryProvider (by reading spark.history.provider).
It creates a HistoryServer
and requests it to bind to spark.history.ui.port port.
Tip
|
The host’s IP can be specified using |
You should see the following INFO message in the logs:
INFO HistoryServer: Bound HistoryServer to [host], and started at [webUrl]
It registers a shutdown hook to call stop
on the HistoryServer
instance.
Tip
|
Use stop-history-server.sh shell script to to stop a running History Server. |
Stopping History Server — stop-history-server.sh
script
You can stop a running instance of HistoryServer
using $SPARK_HOME/sbin/stop-history-server.sh
shell script.
$ ./sbin/stop-history-server.sh
stopping org.apache.spark.deploy.history.HistoryServer
Settings
Setting | Default Value | Description |
---|---|---|
|
The port of the History Server’s UI. |
|
|
The directory with the event logs. The directory has to exist before starting History Server. |
|
|
|
How many Spark applications to retain. |
|
(unbounded) |
how many Spark applications to show in the UI. |
|
|
Enable security when working with HDFS with security enabled (Kerberos). |
|
(empty) |
Kerberos principal. Required when |
|
(empty) |
Keytab to use for login to Kerberos. Required when |
The fully-qualified class name for a ApplicationHistoryProvider. |