SharedState — Shared State Across SparkSessions

SharedState is an internal class that holds the shared state across active SparkSessions.

Table 1. SessionState’s Attributes (Shared State)
Name Type Description

cacheManager

CacheManager

externalCatalog

ExternalCatalog

globalTempViewManager

GlobalTempViewManager

jarClassLoader

NonClosableMutableURLClassLoader

listener

SQLListener

sparkContext

SparkContext

warehousePath

SharedState takes a SparkContext when created. It also adds hive-site.xml to Hadoop’s Configuration in the current SparkContext if found on CLASSPATH.

Note
hive-site.xml is an optional Hive configuration file when working with Hive in Spark.

SharedState is created lazily, i.e. when first accessed after SparkSession is created. It can happen when a new session is created or when the shared services are accessed.

When created, SharedState sets hive.metastore.warehouse.dir to spark.sql.warehouse.dir if hive.metastore.warehouse.dir is not set or spark.sql.warehouse.dir is set. Otherwise, when hive.metastore.warehouse.dir is set and spark.sql.warehouse.dir is not, spark.sql.warehouse.dir gets set to hive.metastore.warehouse.dir.

You should see the following INFO message in the logs:

INFO spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('[hiveWarehouseDir]').

You should see the following INFO message in the logs:

INFO SharedState: Warehouse path is '[warehousePath]'.
Tip

Enable INFO logging level for org.apache.spark.sql.internal.SharedState logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.internal.SharedState=INFO

Refer to Logging.

results matching ""

    No results matching ""