SQLExecution Helper Object

SQLExecution defines spark.sql.execution.id key that is used to track multiple jobs that constitute a single SQL query execution. Whenever a SQL query is to be executed, withNewExecutionId static method is used that sets the key.

Note	Jobs without spark.sql.execution.id key are not considered to belong to SQL query executions.

spark.sql.execution.id EXECUTION_ID_KEY Key

val EXECUTION_ID_KEY = "spark.sql.execution.id"

Tracking Multi-Job SQL Query Executions — `withNewExecutionId` Method

withExecutionId[T](
  sc: SparkContext,
  executionId: String)(body: => T): T  (1)

withNewExecutionId[T](
  sparkSession: SparkSession,
  queryExecution: QueryExecution)(body: => T): T  (2)

With explicit executionId execution identifier
QueryExecution-variant with an auto-generated execution identifier

withNewExecutionId executes body query action with the execution id local property set (as executionId or auto-generated).

The execution id is set as spark.sql.execution.id local property.

The use case is to track Spark jobs (e.g. when running in separate threads) that belong to a single SQL query execution, e.g. to report them as one single Spark SQL query in web UI.

Note	`withNewExecutionId` is used in Dataset.withNewExecutionId.

Caution

FIXME Where is the proxy-like method used? How important is it?

If there is another execution local property set (as spark.sql.execution.id), it is replaced for the course of the current action.

In addition, the QueryExecution variant posts SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events (to LiveListenerBus event bus) before and after executing the body action, respectively. It is used to inform SQLListener when a SQL query execution starts and ends.

Note	Nested execution ids are not supported in the `QueryExecution` variant.

SQLExecution Helper Object