CheckAnalysis — Analysis Validation

CheckAnalysis defines checkAnalysis method that Analyzer uses to check if a logical plan is correct (after all the transformations) by applying validation rules and in the end marking it as analyzed.

Note
An analyzed logical plan is correct and ready for execution.

CheckAnalysis defines extendedCheckRules extension point that allows for extra analysis check rules.

Checking Results of Analysis of Logical Plan and Marking Plan As Analyzed — checkAnalysis Method

checkAnalysis(plan: LogicalPlan): Unit

checkAnalysis recursively checks the correctness of the analysis of the input LogicalPlan and marks it as analyzed.

Note
checkAnalysis fails analysis when finds UnresolvedRelation in the input LogicalPlan…​FIXME What else?

Internally, checkAnalysis processes nodes in the input plan (starting from the leafs, i.e. nodes down the operator tree).

Table 1. checkAnalysis’s Validation Rules (in the order of execution)
LogicalPlan/Operator Behaviour

UnresolvedRelation

Fails analysis with the error message:

Table or view not found: [tableIdentifier]

Unresolved Attribute

Fails analysis with the error message:

cannot resolve '[expr]' given input columns: [from]

Expression with incorrect input data types

Fails analysis with the error message:

cannot resolve '[expr]' due to data type mismatch: [message]

Unresolved Cast

Fails analysis with the error message:

invalid cast from [dataType] to [dataType]

Grouping

Fails analysis with the error message:

grouping() can only be used with GroupingSets/Cube/Rollup

GroupingID

Fails analysis with the error message:

grouping_id() can only be used with GroupingSets/Cube/Rollup

WindowExpression with AggregateExpression with isDistinct flag enabled

Fails analysis with the error message:

Distinct window functions are not supported: [w]

Example:

val windowedDistinctCountExpr = "COUNT(DISTINCT 1) OVER (PARTITION BY value)"
scala> spark.emptyDataset[Int].selectExpr(windowedDistinctCountExpr)
org.apache.spark.sql.AnalysisException: Distinct window functions are not supported: count(distinct 1) windowspecdefinition(value#95, ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);;
Project [COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L]
+- Project [value#95, COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L, COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L]
   +- Window [count(distinct 1) windowspecdefinition(value#95, ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS COUNT(1) OVER (PARTITION BY value UnspecifiedFrame)#97L], [value#95]
      +- Project [value#95]
         +- LocalRelation <empty>, [value#95]

  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:90)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:108)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:86)

FIXME

FIXME

checkAnalysis then checks if plan is analyzed correctly (i.e. no logical plans are left unresolved). If there is one, checkAnalysis fails the analysis with AnalysisException and the following error message:

unresolved operator [o.simpleString]

In the end, checkAnalysis marks the entire logical plan as analyzed.

Note

checkAnalysis is used when:

Extra Analysis Check Rules — extendedCheckRules Extension Point

extendedCheckRules: Seq[LogicalPlan => Unit]

extendedCheckRules is a collection of rules (functions) that checkAnalysis uses for custom analysis checks (after the main validations have been executed).

Note
When a condition of a rule does not hold the function throws an AnalysisException directly or using failAnalysis method.

results matching ""

    No results matching ""