SparkOptimizer — Logical Query Optimizer

SparkOptimizer is the one and only custom logical query plan optimizer in Spark SQL that comes with the additional logical plan optimizations.

Note
You can extend the available logical plan optimizations and register yours using ExperimentalMethods.

SparkOptimizer is available as optimizer attribute of SessionState.

sparkSession.sessionState.optimizer
Note

The result of applying the batches of SparkOptimizer to a LogicalPlan is called optimized logical plan.

Optimized logical plan of a structured query is available as optimizedPlan attribute of QueryExecution.

// Applying two filter in sequence on purpose
// We want to kick CombineTypedFilters optimizer in
val dataset = spark.range(10).filter(_ % 2 == 0).filter(_ == 0)

// optimizedPlan is a lazy value
// Only at the first time you call it you will trigger optimizations
// Next calls end up with the cached already-optimized result
// Use explain to trigger optimizations again
scala> dataset.queryExecution.optimizedPlan
res0: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
TypedFilter <function1>, class java.lang.Long, [StructField(value,LongType,true)], newInstance(class java.lang.Long)
+- Range (0, 10, step=1, splits=Some(8))
Table 1. SparkOptimizer’s Optimization Rules (in the order of execution)
Batch Name Strategy Rules Description

Optimize Metadata Only Query

Once

OptimizeMetadataOnlyQuery

Extract Python UDF from Aggregate

Once

ExtractPythonUDFFromAggregate

Prune File Source Table Partitions

Once

PruneFileSourcePartitions

User Provided Optimizers

FixedPoint

extraOptimizations

Tip

Enable DEBUG or TRACE logging levels for org.apache.spark.sql.execution.SparkOptimizer logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.SparkOptimizer=TRACE

Refer to Logging.

Creating SparkOptimizer Instance

SparkOptimizer takes the following when created:

Note
SparkOptimizer is created when SessionState is created (that initializes optimizer property).

results matching ""

    No results matching ""