QueryPlan — Structured Query Plan

QueryPlan is a part of Catalyst to model a tree of relational operators, i.e. a structured query.

Scala-specific, QueryPlan is an abstract class that is the base class of LogicalPlan and SparkPlan (for logical and physical plans, respectively).

A QueryPlan has an output attributes (that serves as the base for the schema), a collection of expressions and a schema.

QueryPlan has statePrefix that is used when displaying a plan with ! to indicate an invalid plan, and ' to indicate an unresolved plan.

A QueryPlan is invalid if there are missing input attributes and children subnodes are non-empty.

A QueryPlan is unresolved if the column names have not been verified and column types have not been looked up in the Catalog.

QueryPlan Contract

abstract class QueryPlan[T] extends TreeNode[T] {
  def output: Seq[Attribute]
  def validConstraints: Set[Expression]
  // FIXME
}

Table 1. QueryPlan Contract (in alphabetical order)
Method	Description
`validConstraints`
output	Attribute expressions

`outputSet` Property

Caution

FIXME

`producedAttributes` Property

Caution

FIXME

Missing Input Attributes — `missingInput` Property

def missingInput: AttributeSet

missingInput are attributes that are referenced in expressions but not provided by this node’s children (as inputSet) and are not produced by this node (as producedAttributes).

Query Output Schema — `schema` Property

You can request the schema of a QueryPlan using schema that builds StructType from the output attributes.

// the query
val dataset = spark.range(3)

scala> dataset.queryExecution.analyzed.schema
res6: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

Output Schema — `output` Property

output: Seq[Attribute]

output is a collection of Catalyst attributes that represent the result of a projection in a query that is later used to build a schema.

Note	`output` property is also called output schema or result schema.

You can access the output schema through a LogicalPlan.

// the query
val dataset = spark.range(3)

scala> dataset.queryExecution.analyzed.output
res0: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> dataset.queryExecution.withCachedData.output
res1: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> dataset.queryExecution.optimizedPlan.output
res2: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> dataset.queryExecution.sparkPlan.output
res3: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

scala> dataset.queryExecution.executedPlan.output
res4: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)

You can build a StructType from output collection of attributes using toStructType method (that is available through the implicit class AttributeSeq).

scala> dataset.queryExecution.analyzed.output.toStructType
res5: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))

`statePrefix` method

statePrefix: String

statePrefix method is used when printing a plan with ! to indicate an invalid plan and ' to indicate an unresolved plan.

QueryPlan — Structured Query Plan