val EXECUTION_ID_KEY = "spark.sql.execution.id"
SQLExecution Helper Object
SQLExecution
defines spark.sql.execution.id key that is used to track multiple jobs that constitute a single SQL query execution. Whenever a SQL query is to be executed, withNewExecutionId static method is used that sets the key.
Note
|
Jobs without spark.sql.execution.id key are not considered to belong to SQL query executions. |
Tracking Multi-Job SQL Query Executions — withNewExecutionId
Method
withExecutionId[T](
sc: SparkContext,
executionId: String)(body: => T): T (1)
withNewExecutionId[T](
sparkSession: SparkSession,
queryExecution: QueryExecution)(body: => T): T (2)
-
With explicit
executionId
execution identifier -
QueryExecution
-variant with an auto-generated execution identifier
withNewExecutionId
executes body
query action with the execution id local property set (as executionId
or auto-generated).
The execution id is set as spark.sql.execution.id
local property.
The use case is to track Spark jobs (e.g. when running in separate threads) that belong to a single SQL query execution, e.g. to report them as one single Spark SQL query in web UI.
Note
|
withNewExecutionId is used in Dataset.withNewExecutionId.
|
Caution
|
FIXME Where is the proxy-like method used? How important is it? |
If there is another execution local property set (as spark.sql.execution.id
), it is replaced for the course of the current action.
In addition, the QueryExecution
variant posts SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events (to LiveListenerBus event bus) before and after executing the body
action, respectively. It is used to inform SQLListener
when a SQL query execution starts and ends.
Note
|
Nested execution ids are not supported in the QueryExecution variant.
|