val q = spark.range(1)
import org.apache.spark.sql.execution.debug._
scala> q.debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Range (0, 1, step=1, splits=8)
Generated code:
...
// The above is equivalent to the following method chain
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Range (0, 1, step=1, splits=8)
Generated code:
...
CodegenSupport — Physical Operators with Optional Java Code Generation
CodegenSupport
is an extension of physical operators that support Java code generation (aka codegen).
CodegenSupport
allows physical operators to disable codegen.
Tip
|
Use debugCodegen or QueryExecution.debug.codegen methods to review a CodegenSupport -generated Java source code.
|
CodegenSupport Contract
package org.apache.spark.sql.execution
trait CodegenSupport extends SparkPlan {
// only required methods that have no implementation
def doProduce(ctx: CodegenContext): String
def inputRDDs(): Seq[RDD[InternalRow]]
}
Method | Description |
---|---|
Used exclusively in the final produce method to generate a Java source code for processing the internal binary rows from input RDDs. |
|
Generating Java Source Code For…FIXME — consume
Final Method
Caution
|
FIXME |
supportCodegen
Flag
supportCodegen: Boolean = true
Note
|
supportCodegen is used exclusively when CollapseCodegenStages checks if a physical operator supports codegen.
|
Note
|
|
Producing Java Source Code — produce
Method
produce(ctx: CodegenContext, parent: CodegenSupport): String
produce
creates a Java source code for processing the internal binary rows from input RDD.
Internally, produce
executes a "query" that creates a Java source code with the result of doProduce.
Note
|
Executing a "query" is about preparing the query for execution followed by waitForSubqueries. |
You can see the blocks of Java source code generated by produce
that are marked with PRODUCE:
comment.
Tip
|
Enable spark.sql.codegen.comments property to have the comments in the generated Java source code.
|
// ./bin/spark-shell -c spark.sql.codegen.comments=true
import org.apache.spark.sql.execution.debug._
val query = Seq((0 to 4).toList).toDF.
select(explode('value) as "id").
join(spark.range(1), "id")
scala> query.debugCodegen
Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 ==
*Project [id#6]
+- *BroadcastHashJoin [cast(id#6 as bigint)], [id#9L], Inner, BuildRight
:- Generate explode(value#1), false, false, [id#6]
: +- LocalTableScan [value#1]
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *Range (0, 1, step=1, splits=8)
...
/* 043 */ protected void processNext() throws java.io.IOException {
/* 044 */ // PRODUCE: Project [id#6]
/* 045 */ // PRODUCE: BroadcastHashJoin [cast(id#6 as bigint)], [id#9L], Inner, BuildRight
/* 046 */ // PRODUCE: InputAdapter
/* 047 */ while (inputadapter_input.hasNext() && !stopEarly()) {
...
== Subtree 2 / 2 ==
*Range (0, 1, step=1, splits=8)
...
/* 082 */ protected void processNext() throws java.io.IOException {
/* 083 */ // PRODUCE: Range (0, 1, step=1, splits=8)
/* 084 */ // initialize Range
Note
|
produce is used mainly when WholeStageCodegenExec is requested to generate the Java source code for a physical plan.
|