val q = spark.range(9)
val plan = q.queryExecution.executedPlan
scala> println(plan.numberedTreeString)
00 *Range (0, 9, step=1, splits=8)
WholeStageCodegenExec Unary Operator with Java Code Generation
WholeStageCodegenExec
is a unary physical operator that supports code generation for a codegened pipeline of a single physical operator.
WholeStageCodegenExec
is created when CollapseCodegenStages physical preparation rule transforms a physical plan and spark.sql.codegen.wholeStage is enabled.
Note
|
spark.sql.codegen.wholeStage property is enabled by default. |
WholeStageCodegenExec
is marked with *
prefix in the tree output of a physical plan.
Note
|
Use executedPlan phase of a query execution to see WholeStageCodegenExec in the plan.
|
Name | Description |
---|---|
duration |
Tip
|
Use explain operator to know the physical plan of a query and find out whether or not WholeStageCodegen is in use.
|
val q = spark.range(10).where('id === 4)
// Note the stars in the output that are for codegened operators
scala> q.explain
== Physical Plan ==
*Filter (id#0L = 4)
+- *Range (0, 10, step=1, splits=8)
Tip
|
Consider using Debugging Query Execution facility to deep dive into whole stage codegen. |
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#5L = 4)
+- *Range (0, 10, step=1, splits=8)
Note
|
Physical plans that support code generation extend CodegenSupport. |
Tip
|
Enable Add the following line to
Refer to Logging. |
doConsume
Method
Caution
|
FIXME |
Executing WholeStageCodegenExec — doExecute
Method
doExecute(): RDD[InternalRow]
doExecute
generates the Java code that is compiled right afterwards.
If compilation fails and spark.sql.codegen.fallback is enabled, you should see the following WARN message in the logs and doExecute
returns the result of executing the child physical operator.
WARN WholeStageCodegenExec: Whole-stage codegen disabled for this plan:
[tree]
If however code generation and compilation went well, doExecute
branches off per the number of input RDDs.
Note
|
doExecute only supports up to two input RDDs.
|
Caution
|
FIXME |
Note
|
doExecute is a part of SparkPlan Contract to produce the result of a structured query as an RDD of internal binary rows.
|
Generating Java Code for Child Subtree — doCodeGen
Method
doCodeGen(): (CodegenContext, CodeAndComment)
Caution
|
FIXME |
You should see the following DEBUG message in the logs:
DEBUG WholeStageCodegenExec:
[cleanedSource]
Note
|
doCodeGen is used when WholeStageCodegenExec doExecute (and for debugCodegen).
|