WholeStageCodegenExec Unary Operator with Java Code Generation

WholeStageCodegenExec is a unary physical operator that supports code generation for a codegened pipeline of a single physical operator.

WholeStageCodegenExec is created when CollapseCodegenStages physical preparation rule transforms a physical plan and spark.sql.codegen.wholeStage is enabled.

Note
spark.sql.codegen.wholeStage property is enabled by default.

WholeStageCodegenExec is marked with * prefix in the tree output of a physical plan.

Note
Use executedPlan phase of a query execution to see WholeStageCodegenExec in the plan.
val q = spark.range(9)
val plan = q.queryExecution.executedPlan
scala> println(plan.numberedTreeString)
00 *Range (0, 9, step=1, splits=8)
Table 1. WholeStageCodegenExec SQLMetrics (in alphabetical order)
Name Description

pipelineTime

duration

spark sql WholeStageCodegenExec webui.png
Figure 1. WholeStageCodegenExec in web UI (Details for Query)
Tip
Use explain operator to know the physical plan of a query and find out whether or not WholeStageCodegen is in use.
val q = spark.range(10).where('id === 4)
// Note the stars in the output that are for codegened operators
scala> q.explain
== Physical Plan ==
*Filter (id#0L = 4)
+- *Range (0, 10, step=1, splits=8)
Tip
Consider using Debugging Query Execution facility to deep dive into whole stage codegen.
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#5L = 4)
+- *Range (0, 10, step=1, splits=8)
Note
Physical plans that support code generation extend CodegenSupport.
Tip

Enable DEBUG logging level for org.apache.spark.sql.execution.WholeStageCodegenExec logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.WholeStageCodegenExec=DEBUG

Refer to Logging.

doConsume Method

Caution
FIXME

Executing WholeStageCodegenExec — doExecute Method

doExecute(): RDD[InternalRow]

doExecute generates the Java code that is compiled right afterwards.

If compilation fails and spark.sql.codegen.fallback is enabled, you should see the following WARN message in the logs and doExecute returns the result of executing the child physical operator.

WARN WholeStageCodegenExec: Whole-stage codegen disabled for this plan:
[tree]

If however code generation and compilation went well, doExecute branches off per the number of input RDDs.

Note
doExecute only supports up to two input RDDs.
Caution
FIXME
Note
doExecute is a part of SparkPlan Contract to produce the result of a structured query as an RDD of internal binary rows.

Generating Java Code for Child Subtree — doCodeGen Method

doCodeGen(): (CodegenContext, CodeAndComment)
Caution
FIXME

You should see the following DEBUG message in the logs:

DEBUG WholeStageCodegenExec:
[cleanedSource]
Note
doCodeGen is used when WholeStageCodegenExec doExecute (and for debugCodegen).

results matching ""

    No results matching ""