checkAnalysis(plan: LogicalPlan): Unit
CheckAnalysis — Analysis Validation
CheckAnalysis
defines checkAnalysis method that Analyzer uses to check if a logical plan is correct (after all the transformations) by applying validation rules and in the end marking it as analyzed.
Note
|
An analyzed logical plan is correct and ready for execution. |
CheckAnalysis
defines extendedCheckRules extension point that allows for extra analysis check rules.
Checking Results of Analysis of Logical Plan and Marking Plan As Analyzed — checkAnalysis
Method
checkAnalysis
recursively checks the correctness of the analysis of the input LogicalPlan and marks it as analyzed.
Note
|
checkAnalysis fails analysis when finds UnresolvedRelation in the input LogicalPlan …FIXME What else?
|
Internally, checkAnalysis
processes nodes in the input plan
(starting from the leafs, i.e. nodes down the operator tree).
checkAnalysis
skips logical plans that have already undergo analysis.
LogicalPlan/Operator | Behaviour |
---|---|
Fails analysis with the error message:
|
|
Unresolved Attribute |
Fails analysis with the error message:
|
Fails analysis with the error message:
|
|
Unresolved |
Fails analysis with the error message:
|
Fails analysis with the error message:
|
|
Fails analysis with the error message:
|
|
WindowExpression with AggregateExpression with |
Fails analysis with the error message:
Example: val windowedDistinctCountExpr = "COUNT(DISTINCT 1) OVER (PARTITION BY value)" scala> spark.emptyDataset[Int].selectExpr(windowedDistinctCountExpr) org.apache.spark.sql.AnalysisException: Distinct window functions are not supported: count(distinct 1) windowspecdefinition(value#95, ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);; Project [COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L] +- Project [value#95, COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L, COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L] +- Window [count(distinct 1) windowspecdefinition(value#95, ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L], [value#95] +- Project [value#95] +- LocalRelation <empty>, [value#95] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:90) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:108) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:86) |
After the validations, checkAnalysis
executes additional check rules for correct analysis.
checkAnalysis
then checks if plan
is analyzed correctly (i.e. no logical plans are left unresolved). If there is one, checkAnalysis
fails the analysis with AnalysisException
and the following error message:
unresolved operator [o.simpleString]
In the end, checkAnalysis
marks the entire logical plan as analyzed.
Note
|
|
Extra Analysis Check Rules — extendedCheckRules
Extension Point
extendedCheckRules: Seq[LogicalPlan => Unit]
extendedCheckRules
is a collection of rules (functions) that checkAnalysis uses for custom analysis checks (after the main validations have been executed).
Note
|
When a condition of a rule does not hold the function throws an AnalysisException directly or using failAnalysis method.
|