Analyzer: Unresolved Logical Plan ==> Analyzed Logical Plan
Analyzer — Logical Query Plan Analyzer
Analyzer
is a logical query plan analyzer in Spark SQL that semantically validates and transforms an unresolved logical plan to an analyzed logical plan (with proper relational entities) using logical evaluation rules.
You can access a session-specific Analyzer
through SessionState.
val spark: SparkSession = ...
spark.sessionState.analyzer
You can access the analyzed logical plan of a Dataset
using explain (with extended
flag enabled) or SQL’s EXPLAIN EXTENDED
operators.
// sample Dataset
val inventory = spark.range(5)
.withColumn("new_column", 'id + 5 as "plus5")
// Using explain operator (with extended flag enabled)
scala> inventory.explain(extended = true)
== Parsed Logical Plan ==
'Project [*, ('id + 5) AS plus5#81 AS new_column#82]
+- Range (0, 5, step=1, splits=Some(8))
== Analyzed Logical Plan ==
id: bigint, new_column: bigint
Project [id#78L, (id#78L + cast(5 as bigint)) AS new_column#82L]
+- Range (0, 5, step=1, splits=Some(8))
== Optimized Logical Plan ==
Project [id#78L, (id#78L + 5) AS new_column#82L]
+- Range (0, 5, step=1, splits=Some(8))
== Physical Plan ==
*Project [id#78L, (id#78L + 5) AS new_column#82L]
+- *Range (0, 5, step=1, splits=8)
Alternatively, you can also access the analyzed logical plan through QueryExecution
's analyzed attribute (that together with numberedTreeString
method is a very good "debugging" tool).
// Here with numberedTreeString to...please your eyes :)
scala> println(inventory.queryExecution.analyzed.numberedTreeString)
00 Project [id#78L, (id#78L + cast(5 as bigint)) AS new_column#82L]
01 +- Range (0, 5, step=1, splits=Some(8))
Analyzer
defines extendedResolutionRules extension point for additional logical evaluation rules that a custom Analyzer
can use to extend the Resolution batch. The rules are added at the end of the Resolution
batch.
Note
|
SessionState uses its own Analyzer with custom extendedResolutionRules, postHocResolutionRules, and extendedCheckRules extension methods.
|
Analyzer
is created while its owning SessionState is.
Name | Description |
---|---|
Additional rules for Resolution batch. Empty by default |
|
Set when |
|
The only rules in Post-Hoc Resolution batch if defined (that are executed in one pass, i.e. |
Analyzer
is used by QueryExecution
to resolve the managed LogicalPlan
(and, as a sort of follow-up, assert that a structured query has already been properly analyzed, i.e. no failed or unresolved or somehow broken logical plan operators and expressions exist).
Tip
|
Enable
Add the following line to
Refer to Logging. The reason for such weird-looking logger names is that |
Executing Logical Evaluation Rules — execute
Method
Analyzer
is a RuleExecutor that defines the logical evaluation rules (i.e. resolving, removing, and in general modifying it), e.g.
-
Resolves unresolved relations and functions (including
UnresolvedGenerators
) using provided SessionCatalog -
…
Batch Name | Strategy | Rules | Description |
---|---|---|---|
Adds a BroadcastHint unary operator to a logical plan for |
|||
RemoveAllHints |
Removes all the hints (valid or not). |
||
Simple Sanity Check |
|
Checks whether a function identifier (referenced by an UnresolvedFunction) exists in the function registry. Throws a |
|
CTESubstitution |
Resolves |
||
Substitutes UnresolvedWindowExpression with WindowExpression for WithWindowDefinition logical operators. |
|||
EliminateUnions |
Eliminates |
||
SubstituteUnresolvedOrdinals |
Replaces ordinals in |
||
ResolveTableValuedFunctions |
Replaces |
||
Resolves |
|||
ResolveReferences |
|||
ResolveCreateNamedStruct |
|||
ResolveDeserializer |
|||
ResolveNewInstance |
|||
ResolveUpCast |
|||
Resolves grouping expressions up in a logical plan tree:
Expects that all children of a logical operator are already resolved (and given it belongs to a fixed-point batch it will likely happen at some iteration). Fails analysis when
|
|||
Resolves Pivot logical operator to |
|||
ResolveOrdinalInOrderByAndGroupBy |
|||
ResolveMissingReferences |
|||
ResolveGenerate |
|||
Resolves functions using SessionCatalog: If [name] is expected to be a generator. However, its class is [className], which is not a generator. |
|||
Replaces
|
|||
ResolveSubquery |
|||
Resolves WindowExpression expressions |
|||
ResolveNaturalAndUsingJoin |
|||
Resolves (aka replaces) |
|||
Resolves aggregate functions in
|
|||
Resolves TimeWindow expressions to
|
|||
ResolveInlineTables |
Resolves |
||
TypeCoercion.typeCoercionRules |
|||
|
|||
View |
|
AliasViewChild |
|
Nondeterministic |
|
PullOutNondeterministic |
|
UDF |
|
HandleNullInputsForUDF |
|
FixNullability |
|
FixNullability |
|
ResolveTimeZone |
|
ResolveTimeZone |
Replaces |
CleanupAliases |
Tip
|
Consult the sources of Analyzer for the up-to-date list of the evaluation rules.
|
Creating Analyzer Instance
Analyzer
takes the following when created:
-
Number of iterations before FixedPoint rule batches have converged (i.e. Hints, Substitution, Resolution and Cleanup)
Analyzer
initializes the internal registries and counters.
Note
|
Analyzer can also be created without specifying the maxIterations which is then configured using optimizerMaxIterations configuration setting.
|
resolver
Method
resolver: Resolver
resolver
requests CatalystConf for Resolver.
Note
|
Resolver is a mere function of two String parameters that returns true if both refer to the same entity (i.e. for case insensitive equality).
|