CodegenSupport — Physical Operators with Optional Java Code Generation

CodegenSupport is an extension of physical operators that support Java code generation (aka codegen).

CodegenSupport allows physical operators to disable codegen.

Tip	Use debugCodegen or QueryExecution.debug.codegen methods to review a `CodegenSupport`-generated Java source code.

val q = spark.range(1)

import org.apache.spark.sql.execution.debug._
scala> q.debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Range (0, 1, step=1, splits=8)

Generated code:
...

// The above is equivalent to the following method chain
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Range (0, 1, step=1, splits=8)

Generated code:
...

CodegenSupport Contract

package org.apache.spark.sql.execution

trait CodegenSupport extends SparkPlan {
  // only required methods that have no implementation
  def doProduce(ctx: CodegenContext): String
  def inputRDDs(): Seq[RDD[InternalRow]]
}

Table 1. (Subset of) CodegenSupport Contract (in alphabetical order)
Method	Description
`doProduce`	Used exclusively in the final produce method to generate a Java source code for processing the internal binary rows from input RDDs.
`inputRDDs`

Generating Java Source Code For…FIXME — `consume` Final Method

Caution

FIXME

`supportCodegen` Flag

supportCodegen: Boolean = true

Note	`supportCodegen` is used exclusively when `CollapseCodegenStages` checks if a physical operator supports codegen.

Note	`supportCodegen` is disabled for the following physical operators: `GenerateExec` HashAggregateExec with ImperativeAggregates SortMergeJoinExec for all join types except `INNER` and `CROSS`

Producing Java Source Code — `produce` Method

produce(ctx: CodegenContext, parent: CodegenSupport): String

produce creates a Java source code for processing the internal binary rows from input RDD.

Internally, produce executes a "query" that creates a Java source code with the result of doProduce.

Note	Executing a "query" is about preparing the query for execution followed by waitForSubqueries.

You can see the blocks of Java source code generated by produce that are marked with PRODUCE: comment.

Tip	Enable `spark.sql.codegen.comments` property to have the comments in the generated Java source code.

// ./bin/spark-shell -c spark.sql.codegen.comments=true
import org.apache.spark.sql.execution.debug._
val query = Seq((0 to 4).toList).toDF.
  select(explode('value) as "id").
  join(spark.range(1), "id")

scala> query.debugCodegen
Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 ==
*Project [id#6]
+- *BroadcastHashJoin [cast(id#6 as bigint)], [id#9L], Inner, BuildRight
   :- Generate explode(value#1), false, false, [id#6]
   :  +- LocalTableScan [value#1]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
      +- *Range (0, 1, step=1, splits=8)
...
/* 043 */   protected void processNext() throws java.io.IOException {
/* 044 */     // PRODUCE: Project [id#6]
/* 045 */     // PRODUCE: BroadcastHashJoin [cast(id#6 as bigint)], [id#9L], Inner, BuildRight
/* 046 */     // PRODUCE: InputAdapter
/* 047 */     while (inputadapter_input.hasNext() && !stopEarly()) {
...
== Subtree 2 / 2 ==
*Range (0, 1, step=1, splits=8)
...
/* 082 */   protected void processNext() throws java.io.IOException {
/* 083 */     // PRODUCE: Range (0, 1, step=1, splits=8)
/* 084 */     // initialize Range

Note	`produce` is used mainly when `WholeStageCodegenExec` is requested to generate the Java source code for a physical plan.

CodegenSupport — Physical Operators with Optional Java Code Generation

CodegenSupport — Physical Operators with Optional Java Code Generation

CodegenSupport Contract

Generating Java Source Code For…FIXME — `consume` Final Method

`supportCodegen` Flag

Producing Java Source Code — `produce` Method

results matching ""

No results matching ""

CodegenSupport — Physical Operators with Optional Java Code Generation

CodegenSupport Contract

Generating Java Source Code For…​FIXME — consume Final Method

supportCodegen Flag

Producing Java Source Code — produce Method

results matching ""

No results matching ""

Generating Java Source Code For…FIXME — `consume` Final Method

`supportCodegen` Flag

Producing Java Source Code — `produce` Method