CodegenSupport — Physical Operators with Optional Java Code Generation

CodegenSupport is an extension of physical operators that support Java code generation (aka codegen).

CodegenSupport allows physical operators to disable codegen.

Tip
Use debugCodegen or QueryExecution.debug.codegen methods to review a CodegenSupport-generated Java source code.
val q = spark.range(1)

import org.apache.spark.sql.execution.debug._
scala> q.debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Range (0, 1, step=1, splits=8)

Generated code:
...

// The above is equivalent to the following method chain
scala> q.queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Range (0, 1, step=1, splits=8)

Generated code:
...

CodegenSupport Contract

package org.apache.spark.sql.execution

trait CodegenSupport extends SparkPlan {
  // only required methods that have no implementation
  def doProduce(ctx: CodegenContext): String
  def inputRDDs(): Seq[RDD[InternalRow]]
}
Table 1. (Subset of) CodegenSupport Contract (in alphabetical order)
Method Description

doProduce

Used exclusively in the final produce method to generate a Java source code for processing the internal binary rows from input RDDs.

inputRDDs

Generating Java Source Code For…​FIXME — consume Final Method

Caution
FIXME

supportCodegen Flag

supportCodegen: Boolean = true
Note
supportCodegen is used exclusively when CollapseCodegenStages checks if a physical operator supports codegen.
Note

supportCodegen is disabled for the following physical operators:

Producing Java Source Code — produce Method

produce(ctx: CodegenContext, parent: CodegenSupport): String

produce creates a Java source code for processing the internal binary rows from input RDD.

Internally, produce executes a "query" that creates a Java source code with the result of doProduce.

Note
Executing a "query" is about preparing the query for execution followed by waitForSubqueries.

You can see the blocks of Java source code generated by produce that are marked with PRODUCE: comment.

Tip
Enable spark.sql.codegen.comments property to have the comments in the generated Java source code.
// ./bin/spark-shell -c spark.sql.codegen.comments=true
import org.apache.spark.sql.execution.debug._
val query = Seq((0 to 4).toList).toDF.
  select(explode('value) as "id").
  join(spark.range(1), "id")

scala> query.debugCodegen
Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 ==
*Project [id#6]
+- *BroadcastHashJoin [cast(id#6 as bigint)], [id#9L], Inner, BuildRight
   :- Generate explode(value#1), false, false, [id#6]
   :  +- LocalTableScan [value#1]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
      +- *Range (0, 1, step=1, splits=8)
...
/* 043 */   protected void processNext() throws java.io.IOException {
/* 044 */     // PRODUCE: Project [id#6]
/* 045 */     // PRODUCE: BroadcastHashJoin [cast(id#6 as bigint)], [id#9L], Inner, BuildRight
/* 046 */     // PRODUCE: InputAdapter
/* 047 */     while (inputadapter_input.hasNext() && !stopEarly()) {
...
== Subtree 2 / 2 ==
*Range (0, 1, step=1, splits=8)
...
/* 082 */   protected void processNext() throws java.io.IOException {
/* 083 */     // PRODUCE: Range (0, 1, step=1, splits=8)
/* 084 */     // initialize Range
Note
produce is used mainly when WholeStageCodegenExec is requested to generate the Java source code for a physical plan.

results matching ""

    No results matching ""