abstract class QueryPlan[T] extends TreeNode[T] {
def output: Seq[Attribute]
def validConstraints: Set[Expression]
// FIXME
}
QueryPlan — Structured Query Plan
QueryPlan
is a part of Catalyst to model a tree of relational operators, i.e. a structured query.
Scala-specific, QueryPlan
is an abstract class that is the base class of LogicalPlan and SparkPlan (for logical and physical plans, respectively).
A QueryPlan
has an output attributes (that serves as the base for the schema), a collection of expressions and a schema.
QueryPlan
has statePrefix that is used when displaying a plan with !
to indicate an invalid plan, and '
to indicate an unresolved plan.
A QueryPlan
is invalid if there are missing input attributes and children
subnodes are non-empty.
A QueryPlan
is unresolved if the column names have not been verified and column types have not been looked up in the Catalog.
QueryPlan Contract
Method | Description |
---|---|
Attribute expressions |
outputSet
Property
Caution
|
FIXME |
producedAttributes
Property
Caution
|
FIXME |
Missing Input Attributes — missingInput
Property
def missingInput: AttributeSet
missingInput
are attributes that are referenced in expressions but not provided by this node’s children (as inputSet
) and are not produced by this node (as producedAttributes
).
Query Output Schema — schema
Property
You can request the schema of a QueryPlan
using schema
that builds StructType from the output attributes.
// the query
val dataset = spark.range(3)
scala> dataset.queryExecution.analyzed.schema
res6: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))
Output Schema — output
Property
output: Seq[Attribute]
output
is a collection of Catalyst attributes that represent the result of a projection in a query that is later used to build a schema.
Note
|
output property is also called output schema or result schema.
|
You can access the output
schema through a LogicalPlan.
// the query
val dataset = spark.range(3)
scala> dataset.queryExecution.analyzed.output
res0: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> dataset.queryExecution.withCachedData.output
res1: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> dataset.queryExecution.optimizedPlan.output
res2: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> dataset.queryExecution.sparkPlan.output
res3: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
scala> dataset.queryExecution.executedPlan.output
res4: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(id#0L)
You can build a StructType from output
collection of attributes using toStructType
method (that is available through the implicit class AttributeSeq
).
scala> dataset.queryExecution.analyzed.output.toStructType
res5: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))