sparkSession.sessionState.optimizer
SparkOptimizer — Logical Query Optimizer
SparkOptimizer is the one and only custom logical query plan optimizer in Spark SQL that comes with the additional logical plan optimizations.
|
Note
|
You can extend the available logical plan optimizations and register yours using ExperimentalMethods. |
SparkOptimizer is available as optimizer attribute of SessionState.
|
Note
|
The result of applying the batches of Optimized logical plan of a structured query is available as optimizedPlan attribute of
|
| Batch Name | Strategy | Rules | Description |
|---|---|---|---|
Optimize Metadata Only Query |
|
OptimizeMetadataOnlyQuery |
|
Extract Python UDF from Aggregate |
|
ExtractPythonUDFFromAggregate |
|
Prune File Source Table Partitions |
|
PruneFileSourcePartitions |
|
|
Tip
|
Enable Add the following line to
Refer to Logging. |
Creating SparkOptimizer Instance
SparkOptimizer takes the following when created:
|
Note
|
SparkOptimizer is created when SessionState is created (that initializes optimizer property).
|
Further reading or watching
-
(video) Modern Spark DataFrame and Dataset (Intermediate Tutorial) by Adam Breindel from Databricks.