val spark: SparkSession = ...
spark.sessionState.optimizer
Optimizer — Base for Logical Query Plan Optimizers
Optimizer
is the base rule-based logical query plan optimizer in Spark SQL that uses Catalyst Framework to optimize logical query plans using optimization rules.
Note
|
SparkOptimizer is the one and only custom Optimizer .
|
Optimizer
is available as optimizer of a SessionState
.
Optimizer
is a RuleExecutor that defines collection of logical plan optimization rules.
Batch Name | Strategy | Rules | Description |
---|---|---|---|
Finish Analysis |
|
EliminateSubqueryAliases |
|
EliminateView |
|||
ReplaceExpressions |
|||
ReplaceDeduplicateWithAggregate |
|||
Union |
|
CombineUnions |
|
Subquery |
|
OptimizeSubqueries |
|
ReplaceIntersectWithSemiJoin |
|||
ReplaceExceptWithAntiJoin |
|||
ReplaceDistinctWithAggregate |
|||
RemoveLiteralFromGroupExpressions |
|||
RemoveRepetitionFromGroupExpressions |
|||
PushProjectionThroughUnion |
|||
EliminateOuterJoin |
|||
PushPredicateThroughJoin |
|||
InferFiltersFromConstraints |
|||
Collapses Repartition and RepartitionByExpression |
|||
CollapseProject |
|||
CollapseWindow |
|||
CombineFilters |
|||
CombineLimits |
|||
CombineUnions |
|||
OptimizeIn |
|||
ReorderAssociativeOperator |
|||
LikeSimplification |
|||
BooleanSimplification |
|||
SimplifyConditionals |
|||
RemoveDispensableExpressions |
|||
SimplifyBinaryComparison |
|||
PruneFilters |
|||
EliminateSorts |
|||
SimplifyCaseConversionExpressions |
|||
RewriteCorrelatedScalarSubquery |
|||
RemoveRedundantAliases |
|||
RemoveRedundantProject |
|||
SimplifyCreateStructOps |
|||
SimplifyCreateArrayOps |
|||
SimplifyCreateMapOps |
|||
Check Cartesian Products |
|
CheckCartesianProducts |
|
Join Reorder |
|
||
ConvertToLocalRelation |
|||
OptimizeCodegen |
|
OptimizeCodegen |
|
RewriteSubquery |
|
RewritePredicateSubquery |
|
CollapseProject |
Tip
|
Consult the sources of Optimizer for the up-to-date list of the optimization rules.
|
Note
|
Catalyst is a Spark SQL framework for manipulating trees. It can work with trees of relational operators and expressions in logical plans before they end up as physical execution plans. |
scala> sql("select 1 + 1 + 1").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias(((1 + 1) + 1), None)]
+- OneRowRelation$
== Analyzed Logical Plan ==
((1 + 1) + 1): int
Project [((1 + 1) + 1) AS ((1 + 1) + 1)#4]
+- OneRowRelation$
== Optimized Logical Plan ==
Project [3 AS ((1 + 1) + 1)#4]
+- OneRowRelation$
== Physical Plan ==
*Project [3 AS ((1 + 1) + 1)#4]
+- Scan OneRowRelation[]
Name | Initial Value | Description |
---|---|---|
|
Used in Replace Operators, Aggregate, Operator Optimizations, Decimal Optimizations, Typed Filter Optimization and LocalRelation batches (and also indirectly in the User Provided Optimizers rule batch in SparkOptimizer). |
Creating Optimizer Instance
Optimizer
takes the following when created:
Optimizer
initializes the internal properties.