SQLConf

SQLConf is an internal key-value configuration store for parameters and hints used in Spark SQL.

Note

SQLConf is not meant to be used directly and is available through the user-facing interface RuntimeConfig that you can access using SparkSession.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...

scala> spark.conf
res0: org.apache.spark.sql.RuntimeConfig = org.apache.spark.sql.RuntimeConfig@6b58a0f9

SQLConf offers methods to get, set, unset or clear their values, but has also the accessor methods to read the current value of a parameter or hint.

You can access a session-specific SQLConf using SessionState:

import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...

import spark.sessionState.conf

// accessing properties through accessor methods
scala> conf.numShufflePartitions
res0: Int = 200

// setting properties using aliases
import org.apache.spark.sql.internal.SQLConf.SHUFFLE_PARTITIONS
conf.setConf(SHUFFLE_PARTITIONS, 2)
scala> conf.numShufflePartitions
res2: Int = 2

// unset aka reset properties to the default value
conf.unsetConf(SHUFFLE_PARTITIONS)
scala> conf.numShufflePartitions
res4: Int = 200

Table 1. SQLConf’s Accessor Methods (in alphabetical order)
Name	Parameter / Hint	Description
`adaptiveExecutionEnabled`	spark.sql.adaptive.enabled	Used exclusively for `EnsureRequirements` to add ExchangeCoordinator (when adaptive query execution is enabled)
`autoBroadcastJoinThreshold`	spark.sql.autoBroadcastJoinThreshold	Used exclusively in JoinSelection execution planning strategy
`broadcastTimeout`	spark.sql.broadcastTimeout	Used exclusively in BroadcastExchangeExec (for broadcasting a table to executors).
`columnBatchSize`	spark.sql.inMemoryColumnarStorage.batchSize	Used…FIXME
`dataFramePivotMaxValues`	spark.sql.pivotMaxValues	Used exclusively in pivot operator.
`dataFrameRetainGroupColumns`	spark.sql.retainGroupColumns	Used exclusively in RelationalGroupedDataset when creating the result `Dataset` (after `agg`, `count`, `mean`, `max`, `avg`, `min`, and `sum` operators).
`defaultSizeInBytes`	spark.sql.defaultSizeInBytes	Used…FIXME
`numShufflePartitions`	spark.sql.shuffle.partitions	Used in: Dataset’s repartition operator (for a RepartitionByExpression logical operator) SparkSqlAstBuilder (for a RepartitionByExpression logical operator) JoinSelection execution planning strategy SetCommand logical command EnsureRequirements physical plan optimization
`joinReorderEnabled`	spark.sql.cbo.joinReorder.enabled	Used exclusively in CostBasedJoinReorder logical plan optimization
`limitScaleUpFactor`	spark.sql.limit.scaleUpFactor	Used exclusively when a physical operator is requested the first n rows as an array.
`preferSortMergeJoin`	spark.sql.join.preferSortMergeJoin	Used exclusively in JoinSelection execution planning strategy to prefer sort merge join over shuffle hash join.
`starSchemaDetection`	spark.sql.cbo.starSchemaDetection	Used exclusively in ReorderJoin logical plan optimization (and indirectly in `StarSchemaDetection`)
`useCompression`	spark.sql.inMemoryColumnarStorage.compressed	Used…FIXME
`wholeStageEnabled`	spark.sql.codegen.wholeStage	Used in: CollapseCodegenStages to control codegen `ParquetFileFormat` to control row batch reading
`wholeStageFallback`	spark.sql.codegen.fallback	Used exclusively when `WholeStageCodegenExec` is executed.
`wholeStageMaxNumFields`	spark.sql.codegen.maxFields	Used in: CollapseCodegenStages to control codegen `ParquetFileFormat` to control row batch reading
`windowExecBufferSpillThreshold`	spark.sql.windowExec.buffer.spill.threshold	Used exclusively when `WindowExec` unary physical operator is executed.
`useObjectHashAggregation`	spark.sql.execution.useObjectHashAggregateExec	Used exclusively in `Aggregation` execution planning strategy when selecting a physical plan.

Name Default Value Description

spark.sql.adaptive.enabled

false

Enables adaptive query execution

Note	Adaptive query execution is not supported for streaming Datasets and is disabled at execution.

Use adaptiveExecutionEnabled method to access the current value.

spark.sql.autoBroadcastJoinThreshold

10L * 1024 * 1024

Maximum size (in bytes) for a table that will be broadcast to all worker nodes when performing a join.

If the size of the statistics of the logical plan of a table is at most the setting, the DataFrame is broadcast for join.

Negative values or 0 disable broadcasting.

Use autoBroadcastJoinThreshold method to access the current value.

spark.sql.broadcastTimeout

5 * 60

Timeout in seconds for the broadcast wait time in broadcast joins.

When negative, it is assumed infinite (i.e. Duration.Inf)

Used through SQLConf.broadcastTimeout.

spark.sql.cbo.enabled

false

Enables cost-based optimization (CBO) for estimation of plan statistics when enabled (i.e. true).

Used (through SQLConf.cboEnabled method) in:

ReorderJoin logical plan optimization (and indirectly in StarSchemaDetection for reorderStarJoins)
CostBasedJoinReorder logical plan optimization
For statistics estimates in Project, Filter, Join, and Aggregate logical plans

spark.sql.cbo.joinReorder.enabled

false

Enables join reorder for cost-based optimization (CBO).

Use joinReorderEnabled method to access the current value.

spark.sql.cbo.starSchemaDetection

false

Enables join reordering based on star schema detection for cost-based optimization (CBO) in ReorderJoin logical plan optimization.

Use starSchemaDetection method to access the current value.

spark.sql.codegen.fallback

true

(internal) Whether the whole stage codegen could be temporary disabled for the part of a query that has failed to compile generated code (true) or not (false).

Use wholeStageFallback method to access the current value.

spark.sql.codegen.maxFields

100

(internal) Maximum number of output fields (including nested fields) that whole-stage codegen supports. Going above the number deactivates whole-stage codegen.

Use wholeStageMaxNumFields method to access the current value.

spark.sql.codegen.wholeStage

true

(internal) Whether the whole stage (of multiple physical operators) will be compiled into a single Java method (true) or not (false).

Use wholeStageEnabled method to access the current value.

spark.sql.defaultSizeInBytes

Java’s Long.MaxValue

(internal) Table size used in query planning.

It is by default set to Java’s Long.MaxValue which is larger than spark.sql.autoBroadcastJoinThreshold to be more conservative. That is to say by default the optimizer will not choose to broadcast a table unless it knows for sure its size is small enough.

Use useObjectHashAggregation method to access the current value.

spark.sql.execution.useObjectHashAggregateExec

true

Flag to enable ObjectHashAggregateExec in Aggregation execution planning strategy.

Use useObjectHashAggregation method to access the current value.

spark.sql.inMemoryColumnarStorage.batchSize

10000

(internal) Controls…FIXME

Use columnBatchSize method to access the current value.

spark.sql.inMemoryColumnarStorage.compressed

true

(internal) Controls…FIXME

Use useCompression method to access the current value.

spark.sql.join.preferSortMergeJoin

true

(internal) Controls JoinSelection execution planning strategy to prefer sort merge join over shuffle hash join.

Use preferSortMergeJoin method to access the current value.

spark.sql.limit.scaleUpFactor

4

(internal) Minimal increase rate in the number of partitions between attempts when executing take operator on a structured query. Higher values lead to more partitions read. Lower values might lead to longer execution times as more jobs will be run.

Use limitScaleUpFactor method to access the current value.

spark.sql.optimizer.maxIterations

100

Maximum number of iterations for Analyzer and Optimizer.

spark.sql.pivotMaxValues

10000

Maximum number of (distinct) values that will be collected without error (when doing a pivot without specifying the values for the pivot column)

Use dataFramePivotMaxValues method to access the current value.

spark.sql.retainGroupColumns

true

Controls whether to retain columns used for aggregation or not (in RelationalGroupedDataset operators).

Use dataFrameRetainGroupColumns method to access the current value.

spark.sql.selfJoinAutoResolveAmbiguity

true

Control whether to resolve ambiguity in join conditions for self-joins automatically.

spark.sql.shuffle.partitions

200

Default number of partitions to use when shuffling data for joins or aggregations.

Corresponds to Apache Hive’s mapred.reduce.tasks property that Spark considers deprecated.

Use numShufflePartitions method to access the current value.

spark.sql.streaming.fileSink.log.deletion

true

Controls whether to delete the expired log files in file stream sink.

spark.sql.streaming.fileSink.log.cleanupDelay

FIXME

spark.sql.streaming.schemaInference

FIXME

spark.sql.streaming.fileSink.log.compactInterval

FIXME

spark.sql.windowExec.buffer.spill.threshold

4096

(internal) Threshold for number of rows buffered in window operator

Use windowExecBufferSpillThreshold method to access the current value.

Note	`SQLConf` is a `private[sql]` serializable class in `org.apache.spark.sql.internal` package.

Getting Parameters and Hints

You can get the current parameters and hints using the following family of get methods.

getConfString(key: String): String
getConf[T](entry: ConfigEntry[T], defaultValue: T): T
getConf[T](entry: ConfigEntry[T]): T
getConf[T](entry: OptionalConfigEntry[T]): Option[T]
getConfString(key: String, defaultValue: String): String
getAllConfs: immutable.Map[String, String]
getAllDefinedConfs: Seq[(String, String, String)]

Setting Parameters and Hints

You can set parameters and hints using the following family of set methods.

setConf(props: Properties): Unit
setConfString(key: String, value: String): Unit
setConf[T](entry: ConfigEntry[T], value: T): Unit

Unsetting Parameters and Hints

You can unset parameters and hints using the following family of unset methods.

unsetConf(key: String): Unit
unsetConf(entry: ConfigEntry[_]): Unit

Clearing All Parameters and Hints

clear(): Unit

You can use clear to remove all the parameters and hints in SQLConf.

SQLConf

SQLConf

Getting Parameters and Hints

Setting Parameters and Hints

Unsetting Parameters and Hints

Clearing All Parameters and Hints

results matching ""

No results matching ""