FileSourceStrategy Execution Planning Strategy

FileSourceStrategy is an execution planning strategy (of SparkPlanner) that destructures and then optimizes a LogicalPlan.

Tip

Enable INFO logging level for org.apache.spark.sql.execution.datasources.FileSourceStrategy logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.FileSourceStrategy=INFO

Refer to Logging.

Caution
FIXME

PhysicalOperation

PhysicalOperation is a pattern used to destructure a LogicalPlan object into a tuple.

(Seq[NamedExpression], Seq[Expression], LogicalPlan)

The following idiom is often used in Strategy implementations (e.g. HiveTableScans, InMemoryScans, DataSourceStrategy, FileSourceStrategy):

def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
  case PhysicalOperation(projections, predicates, plan) =>
    // do something
  case _ => Nil
}

Whenever used to pattern match to a LogicalPlan, PhysicalOperation's unapply is called.

unapply(plan: LogicalPlan): Option[ReturnType]

unapply uses collectProjectsAndFilters method that recursively destructures the input LogicalPlan.

Note
unapply is almost collectProjectsAndFilters method itself (with some manipulations of the return value).

collectProjectsAndFilters Method

collectProjectsAndFilters(plan: LogicalPlan):
  (Option[Seq[NamedExpression]], Seq[Expression], LogicalPlan, Map[Attribute, Expression])

collectProjectsAndFilters is a pattern used to destructure a LogicalPlan that can be Project, Filter or BroadcastHint. Any other LogicalPlan give an all-empty response.

results matching ""

    No results matching ""