val q: DataFrame = ...
val plan = q.queryExecution.logical
LogicalPlan — Logical Query Plan / Logical Operator
LogicalPlan
is a base Catalyst query plan for logical operators to build a logical query plan that, when analyzed and resolved, can be resolved to a physical query plan.
Tip
|
Use QueryExecution of a structured query to see the logical plan. |
LogicalPlan
can be analyzed which is to say that the plan (including children) has gone through analysis and verification.
scala> plan.analyzed
res1: Boolean = true
A logical plan can also be resolved to a specific schema.
scala> plan.resolved
res2: Boolean = true
A logical plan knows the size of objects that are results of query operators, like join
, through Statistics
object.
scala> val stats = plan.statistics
stats: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(8,false)
A logical plan knows the maximum number of records it can compute.
scala> val maxRows = plan.maxRows
maxRows: Option[Long] = None
LogicalPlan
can be streaming if it contains one or more structured streaming sources.
LogicalPlan | Description |
---|---|
Logical operator with no child operators |
|
Logical plan with a single child (logical plan). |
|
Logical operator with two child operators |
|
Name | Description |
---|---|
Cached plan statistics (as Computed and cached in stats. Used in stats and verboseStringWithSuffix. Reset in invalidateStatsCache |
Getting Cached or Calculating Estimated Statistics — stats
Method
stats(conf: CatalystConf): Statistics
stats
returns the cached plan statistics or computes a new one (and caches it as statsCache).
Note
|
|
invalidateStatsCache
method
Caution
|
FIXME |
verboseStringWithSuffix
method
Caution
|
FIXME |
resolveQuoted
method
Caution
|
FIXME |
setAnalyzed
method
Caution
|
FIXME |
Command
— Logical Commands
Command
is the base for leaf logical plans that represent non-query commands to be executed by the system. It defines output
to return an empty collection of Attributes.
Known commands are:
-
CreateTable
-
Any RunnableCommand
Is Logical Plan Streaming? — isStreaming
method
isStreaming: Boolean
isStreaming
is a part of the public API of LogicalPlan
and is enabled (i.e. true
) when a logical plan is a streaming source.
By default, it walks over subtrees and calls itself, i.e. isStreaming
, on every child node to find a streaming source.
val spark: SparkSession = ...
// Regular dataset
scala> val ints = spark.createDataset(0 to 9)
ints: org.apache.spark.sql.Dataset[Int] = [value: int]
scala> ints.queryExecution.logical.isStreaming
res1: Boolean = false
// Streaming dataset
scala> val logs = spark.readStream.format("text").load("logs/*.out")
logs: org.apache.spark.sql.DataFrame = [value: string]
scala> logs.queryExecution.logical.isStreaming
res2: Boolean = true
Note
|
Streaming Datasets are part of Structured Streaming. |
Computing Statistics Estimates (of All Child Logical Operators) for Cost-Based Optimizer — computeStats
method
computeStats(conf: CatalystConf): Statistics
computeStats
creates a Statistics
with sizeInBytes
as a product of statistics of all child logical plans.
For a no-children logical plan, computeStats
reports a UnsupportedOperationException
:
LeafNode [nodeName] must implement statistics.
Note
|
computeStats is a protected method that logical operators are expected to override to provide their own custom plan statistics calculation.
|
Note
|
computeStats is used exclusively when LogicalPlan is requested for logical plan statistics estimates.
|