build/sbt 'sql/testOnly *BenchmarkWholeStageCodegen'
Whole-Stage Code Generation (aka Whole-Stage CodeGen)
Note
|
Review SPARK-12795 Whole stage codegen to learn about the work to support it. |
Whole-Stage Code Generation (aka Whole-Stage CodeGen) fuses multiple operators (as a subtree of plans that support code generation) together into a single Java function that is aimed at improving execution performance. It collapses a query into a single optimized function that eliminates virtual function calls and leverages CPU registers for intermediate data.
Note
|
Whole-Stage Code Generation is enabled by default (using spark.sql.codegen.wholeStage property). |
Note
|
Whole stage codegen is used by some modern massively parallel processing (MPP) databases to archive great performance. See Efficiently Compiling Efficient Query Plans for Modern Hardware (PDF). |
Note
|
Janino is used to compile a Java source code into a Java class. |
Before a query is executed, CollapseCodegenStages physical preparation rule is used to find the plans that support codegen and collapse them together as WholeStageCodegen
. It is part of the sequence of rules QueryExecution.preparations that will be applied in order to the physical plan before execution.
BenchmarkWholeStageCodegen — Performance Benchmark
BenchmarkWholeStageCodegen
class provides a benchmark to measure whole stage codegen performance.
You can execute it using the command:
Note
|
You need to un-ignore tests in BenchmarkWholeStageCodegen by replacing ignore with test .
|
$ build/sbt 'sql/testOnly *BenchmarkWholeStageCodegen'
...
Running benchmark: range/limit/sum
Running case: range/limit/sum codegen=false
22:55:23.028 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Running case: range/limit/sum codegen=true
Java HotSpot(TM) 64-Bit Server VM 1.8.0_77-b03 on Mac OS X 10.10.5
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
range/limit/sum: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------
range/limit/sum codegen=false 376 / 433 1394.5 0.7 1.0X
range/limit/sum codegen=true 332 / 388 1581.3 0.6 1.1X
[info] - range/limit/sum (10 seconds, 74 milliseconds)