JoinSelection Execution Planning Strategy

JoinSelection is an execution planning strategy (of SparkPlanner) that translates Join logical operator to one of the available join physical operators per join physical operator selection requirements.

Table 1. Join Physical Operator Selection Requirements (in execution order)
Physical Join Operator Selection Requirements

BroadcastHashJoinExec

There are joining keys and one of the following holds:

ShuffledHashJoinExec

There are joining keys and one of the following holds:

SortMergeJoinExec

Left join keys orderable

BroadcastNestedLoopJoinExec

There are no joining keys and one of the following holds:

CartesianProductExec

There are no joining keys and join type is INNER or CROSS

BroadcastNestedLoopJoinExec

Default when no other have matched

Note
JoinSelection uses ExtractEquiJoinKeys to destructure a Join logical plan.

ExtractEquiJoinKeys

ExtractEquiJoinKeys is a pattern used to destructure a Join logical operator into a tuple for join physical operator selection.

(JoinType, Seq[Expression], Seq[Expression], Option[Expression], LogicalPlan, LogicalPlan)

Is Left-Side Plan At Least 3 Times Smaller Than Right-Side Plan? — muchSmaller Internal Condition

muchSmaller(a: LogicalPlan, b: LogicalPlan): Boolean

muchSmaller condition holds when plan a is at least 3 times smaller than plan b.

Internally, muchSmaller calculates the estimated statistics for the input logical plans and compares their physical size in bytes (sizeInBytes).

Note
muchSmaller is used exclusively when JoinSelection checks join selection requirements for ShuffledHashJoinExec physical operator.

canBuildLocalHashMap Internal Condition

canBuildLocalHashMap(plan: LogicalPlan): Boolean

canBuildLocalHashMap condition holds for the logical plan whose single partition is small enough to build a hash table (i.e. spark.sql.autoBroadcastJoinThreshold multiplied by spark.sql.shuffle.partitions).

Internally, canBuildLocalHashMap calculates the estimated statistics for the input logical plans and takes the size in bytes (sizeInBytes).

Note
canBuildLocalHashMap is used when JoinSelection checks join selection requirements for ShuffledHashJoinExec physical operator.

canBuildLeft Internal Condition

canBuildLeft(joinType: JoinType): Boolean

canBuildLeft condition holds for CROSS, INNER and RIGHT OUTER join types. Otherwise, canBuildLeft is false.

Note
canBuildLeft is used when JoinSelection checks join selection requirements for BroadcastHashJoinExec, ShuffledHashJoinExec or BroadcastNestedLoopJoinExec physical operators.

canBuildRight Internal Condition

canBuildRight(joinType: JoinType): Boolean

canBuildRight condition holds for joins that are:

  • CROSS, INNER, LEFT ANTI, LEFT OUTER, LEFT SEMI or Existence

Otherwise, canBuildRight is false.

Note
canBuildRight is used when JoinSelection checks join selection requirements for BroadcastHashJoinExec, ShuffledHashJoinExec or BroadcastNestedLoopJoinExec physical operators.

Can Logical Plan Be Broadcast? — canBroadcast Internal Condition

canBroadcast(plan: LogicalPlan): Boolean

canBroadcast condition holds for logical operators with statistics that can be broadcast and of non-negative size up to spark.sql.autoBroadcastJoinThreshold.

results matching ""

    No results matching ""