(JoinType, Seq[Expression], Seq[Expression], Option[Expression], LogicalPlan, LogicalPlan)
JoinSelection Execution Planning Strategy
JoinSelection is an execution planning strategy (of SparkPlanner) that translates Join logical operator to one of the available join physical operators per join physical operator selection requirements.
| Physical Join Operator | Selection Requirements |
|---|---|
There are joining keys and one of the following holds:
|
|
There are joining keys and one of the following holds:
|
|
Left join keys orderable |
|
There are no joining keys and one of the following holds:
|
|
|
There are no joining keys and join type is |
Default when no other have matched |
|
Note
|
JoinSelection uses ExtractEquiJoinKeys to destructure a Join logical plan.
|
ExtractEquiJoinKeys
ExtractEquiJoinKeys is a pattern used to destructure a Join logical operator into a tuple for join physical operator selection.
Is Left-Side Plan At Least 3 Times Smaller Than Right-Side Plan? — muchSmaller Internal Condition
muchSmaller(a: LogicalPlan, b: LogicalPlan): Boolean
muchSmaller condition holds when plan a is at least 3 times smaller than plan b.
Internally, muchSmaller calculates the estimated statistics for the input logical plans and compares their physical size in bytes (sizeInBytes).
|
Note
|
muchSmaller is used exclusively when JoinSelection checks join selection requirements for ShuffledHashJoinExec physical operator.
|
canBuildLocalHashMap Internal Condition
canBuildLocalHashMap(plan: LogicalPlan): Boolean
canBuildLocalHashMap condition holds for the logical plan whose single partition is small enough to build a hash table (i.e. spark.sql.autoBroadcastJoinThreshold multiplied by spark.sql.shuffle.partitions).
Internally, canBuildLocalHashMap calculates the estimated statistics for the input logical plans and takes the size in bytes (sizeInBytes).
|
Note
|
canBuildLocalHashMap is used when JoinSelection checks join selection requirements for ShuffledHashJoinExec physical operator.
|
canBuildLeft Internal Condition
canBuildLeft(joinType: JoinType): Boolean
canBuildLeft condition holds for CROSS, INNER and RIGHT OUTER join types. Otherwise, canBuildLeft is false.
|
Note
|
canBuildLeft is used when JoinSelection checks join selection requirements for BroadcastHashJoinExec, ShuffledHashJoinExec or BroadcastNestedLoopJoinExec physical operators.
|
canBuildRight Internal Condition
canBuildRight(joinType: JoinType): Boolean
canBuildRight condition holds for joins that are:
-
CROSS, INNER, LEFT ANTI, LEFT OUTER, LEFT SEMI or Existence
Otherwise, canBuildRight is false.
|
Note
|
canBuildRight is used when JoinSelection checks join selection requirements for BroadcastHashJoinExec, ShuffledHashJoinExec or BroadcastNestedLoopJoinExec physical operators.
|
Can Logical Plan Be Broadcast? — canBroadcast Internal Condition
canBroadcast(plan: LogicalPlan): Boolean
canBroadcast condition holds for logical operators with statistics that can be broadcast and of non-negative size up to spark.sql.autoBroadcastJoinThreshold.