Settings

The following list are the settings used to configure Spark SQL applications.

You can set them in a SparkSession upon instantiation using config method.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = SparkSession.builder
  .master("local[*]")
  .appName("My Spark Application")
  .config("spark.sql.warehouse.dir", "c:/Temp") (1)
  .getOrCreate
  1. Sets spark.sql.warehouse.dir for the Spark SQL session

Table 1. Spark SQL Properties (in alphabetical order)
Name Default Description

spark.sql.catalogImplementation

in-memory

(internal) Selects the active catalog implementation from:

  • in-memory

  • hive

Tip
You can enable Hive support in a SparkSession using enableHiveSupport builder method.

spark.sql.sources.default

parquet

Defines the default data source to use for DataFrameReader.

Used when:

spark.sql.warehouse.dir

spark.sql.warehouse.dir (default: ${system:user.dir}/spark-warehouse) is the default location of Hive warehouse directory (using Derby) with managed databases and tables.

See also the official Hive Metastore Administration document.

spark.sql.parquet.filterPushdown

spark.sql.parquet.filterPushdown (default: true) is a flag to control the filter predicate push-down optimization for data sources using parquet file format.

spark.sql.allowMultipleContexts

spark.sql.allowMultipleContexts (default: true) controls whether creating multiple SQLContexts/HiveContexts is allowed.

spark.sql.columnNameOfCorruptRecord

spark.sql.columnNameOfCorruptRecord…​FIXME

spark.sql.dialect

spark.sql.dialect - FIXME

spark.sql.streaming.checkpointLocation

spark.sql.streaming.checkpointLocation is the default location for storing checkpoint data for continuously executing queries.

results matching ""

    No results matching ""