Optimizer — Base for Logical Query Plan Optimizers

Optimizer is the base rule-based logical query plan optimizer in Spark SQL that uses Catalyst Framework to optimize logical query plans using optimization rules.

Note
SparkOptimizer is the one and only custom Optimizer.

Optimizer is available as optimizer of a SessionState.

val spark: SparkSession = ...
spark.sessionState.optimizer
Table 1. Optimizer’s Logical Plan Optimization Rules (in the order of execution)
Batch Name Strategy Rules Description

Finish Analysis

Once

EliminateSubqueryAliases

EliminateView

ReplaceExpressions

ComputeCurrentTime

GetCurrentDatabase

RewriteDistinctAggregates

ReplaceDeduplicateWithAggregate

Union

Once

CombineUnions

Subquery

Once

OptimizeSubqueries

Replace Operators

FixedPoint

ReplaceIntersectWithSemiJoin

ReplaceExceptWithAntiJoin

ReplaceDistinctWithAggregate

Aggregate

FixedPoint

RemoveLiteralFromGroupExpressions

RemoveRepetitionFromGroupExpressions

Operator Optimizations

FixedPoint

PushProjectionThroughUnion

ReorderJoin

EliminateOuterJoin

PushPredicateThroughJoin

PushDownPredicate

LimitPushDown

ColumnPruning

InferFiltersFromConstraints

CollapseRepartition

Collapses Repartition and RepartitionByExpression

CollapseProject

CollapseWindow

CombineFilters

CombineLimits

CombineUnions

NullPropagation

FoldablePropagation

OptimizeIn

ConstantFolding

ReorderAssociativeOperator

LikeSimplification

BooleanSimplification

SimplifyConditionals

RemoveDispensableExpressions

SimplifyBinaryComparison

PruneFilters

EliminateSorts

SimplifyCasts

SimplifyCaseConversionExpressions

RewriteCorrelatedScalarSubquery

EliminateSerialization

RemoveRedundantAliases

RemoveRedundantProject

SimplifyCreateStructOps

SimplifyCreateArrayOps

SimplifyCreateMapOps

Check Cartesian Products

Once

CheckCartesianProducts

Join Reorder

Once

CostBasedJoinReorder

Decimal Optimizations

FixedPoint

DecimalAggregates

Typed Filter Optimization

FixedPoint

CombineTypedFilters

LocalRelation

FixedPoint

ConvertToLocalRelation

PropagateEmptyRelation

OptimizeCodegen

Once

OptimizeCodegen

RewriteSubquery

Once

RewritePredicateSubquery

CollapseProject

Tip
Consult the sources of Optimizer for the up-to-date list of the optimization rules.
Note
Catalyst is a Spark SQL framework for manipulating trees. It can work with trees of relational operators and expressions in logical plans before they end up as physical execution plans.
scala> sql("select 1 + 1 + 1").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias(((1 + 1) + 1), None)]
+- OneRowRelation$

== Analyzed Logical Plan ==
((1 + 1) + 1): int
Project [((1 + 1) + 1) AS ((1 + 1) + 1)#4]
+- OneRowRelation$

== Optimized Logical Plan ==
Project [3 AS ((1 + 1) + 1)#4]
+- OneRowRelation$

== Physical Plan ==
*Project [3 AS ((1 + 1) + 1)#4]
+- Scan OneRowRelation[]
Table 2. Optimizer’s Properties (in alphabetical order)
Name Initial Value Description

fixedPoint

FixedPoint with the number of iterations as defined by spark.sql.optimizer.maxIterations

Used in Replace Operators, Aggregate, Operator Optimizations, Decimal Optimizations, Typed Filter Optimization and LocalRelation batches (and also indirectly in the User Provided Optimizers rule batch in SparkOptimizer).

Creating Optimizer Instance

Optimizer takes the following when created:

Optimizer initializes the internal properties.

results matching ""

    No results matching ""