(JoinType, Seq[Expression], Seq[Expression], Option[Expression], LogicalPlan, LogicalPlan)
JoinSelection Execution Planning Strategy
JoinSelection
is an execution planning strategy (of SparkPlanner) that translates Join logical operator to one of the available join physical operators per join physical operator selection requirements.
Physical Join Operator | Selection Requirements |
---|---|
There are joining keys and one of the following holds:
|
|
There are joining keys and one of the following holds:
|
|
Left join keys orderable |
|
There are no joining keys and one of the following holds:
|
|
|
There are no joining keys and join type is |
Default when no other have matched |
Note
|
JoinSelection uses ExtractEquiJoinKeys to destructure a Join logical plan.
|
ExtractEquiJoinKeys
ExtractEquiJoinKeys
is a pattern used to destructure a Join logical operator into a tuple for join physical operator selection.
Is Left-Side Plan At Least 3 Times Smaller Than Right-Side Plan? — muchSmaller
Internal Condition
muchSmaller(a: LogicalPlan, b: LogicalPlan): Boolean
muchSmaller
condition holds when plan a
is at least 3 times smaller than plan b
.
Internally, muchSmaller
calculates the estimated statistics for the input logical plans and compares their physical size in bytes (sizeInBytes
).
Note
|
muchSmaller is used exclusively when JoinSelection checks join selection requirements for ShuffledHashJoinExec physical operator.
|
canBuildLocalHashMap
Internal Condition
canBuildLocalHashMap(plan: LogicalPlan): Boolean
canBuildLocalHashMap
condition holds for the logical plan
whose single partition is small enough to build a hash table (i.e. spark.sql.autoBroadcastJoinThreshold multiplied by spark.sql.shuffle.partitions).
Internally, canBuildLocalHashMap
calculates the estimated statistics for the input logical plans and takes the size in bytes (sizeInBytes
).
Note
|
canBuildLocalHashMap is used when JoinSelection checks join selection requirements for ShuffledHashJoinExec physical operator.
|
canBuildLeft
Internal Condition
canBuildLeft(joinType: JoinType): Boolean
canBuildLeft
condition holds for CROSS, INNER and RIGHT OUTER join types. Otherwise, canBuildLeft
is false
.
Note
|
canBuildLeft is used when JoinSelection checks join selection requirements for BroadcastHashJoinExec , ShuffledHashJoinExec or BroadcastNestedLoopJoinExec physical operators.
|
canBuildRight
Internal Condition
canBuildRight(joinType: JoinType): Boolean
canBuildRight
condition holds for joins that are:
-
CROSS, INNER, LEFT ANTI, LEFT OUTER, LEFT SEMI or Existence
Otherwise, canBuildRight
is false
.
Note
|
canBuildRight is used when JoinSelection checks join selection requirements for BroadcastHashJoinExec , ShuffledHashJoinExec or BroadcastNestedLoopJoinExec physical operators.
|
Can Logical Plan Be Broadcast? — canBroadcast
Internal Condition
canBroadcast(plan: LogicalPlan): Boolean
canBroadcast
condition holds for logical operators with statistics that can be broadcast and of non-negative size up to spark.sql.autoBroadcastJoinThreshold.