fromSparkConf(_conf: SparkConf, module: String, numUsableCores: Int = 0): TransportConf
TransportConf — Transport Configuration
TransportConf is a class for the transport-related network configuration for modules, e.g. ExternalShuffleService or YarnShuffleService.
It exposes methods to access settings for a single module as spark.module.prefix or general network-related settings.
Creating TransportConf from SparkConf — fromSparkConf Method
|
Note
|
fromSparkConf belongs to SparkTransportConf object.
|
fromSparkConf creates a TransportConf for module from the given SparkConf.
Internally, fromSparkConf calculates the default number of threads for both the Netty client and server thread pools.
fromSparkConf uses spark.[module].io.serverThreads and spark.[module].io.clientThreads if specified for the number of threads to use. If not defined, fromSparkConf sets them to the default number of threads calculated earlier.
Calculating Default Number of Threads (8 Maximum) — defaultNumThreads Internal Method
defaultNumThreads(numUsableCores: Int): Int
|
Note
|
defaultNumThreads belongs to SparkTransportConf object.
|
defaultNumThreads calculates the default number of threads for both the Netty client and server thread pools that is 8 maximum or numUsableCores is smaller. If numUsableCores is not specified, defaultNumThreads uses the number of processors available to the Java virtual machine.
|
Note
|
8 is the maximum number of threads for Netty and is not configurable. |
|
Note
|
defaultNumThreads uses Java’s Runtime for the number of processors in JVM.
|
spark.module.prefix Settings
The settings can be in the form of spark.[module].[prefix] with the following prefixes:
-
io.mode(default:NIO) — the IO mode:nioorepoll. -
io.preferDirectBufs(default:true) — a flag to control whether Spark prefers allocating off-heap byte buffers within Netty (true) or not (false). -
io.connectionTimeout(default: spark.network.timeout or120s) — the connection timeout in milliseconds. -
io.backLog(default:-1for no backlog) — the requested maximum length of the queue of incoming connections. -
io.numConnectionsPerPeer(default:1) — the number of concurrent connections between two nodes for fetching data. -
io.serverThreads(default:0i.e. 2x#cores) — the number of threads used in the server thread pool. -
io.clientThreads(default:0i.e. 2x#cores) — the number of threads used in the client thread pool. -
io.receiveBuffer(default:-1) — the receive buffer size (SO_RCVBUF). -
io.sendBuffer(default:-1) — the send buffer size (SO_SNDBUF). -
sasl.timeout(default:30s) — the timeout (in milliseconds) for a single round trip of SASL token exchange. -
io.maxRetries(default:3) — the maximum number of times Spark will try IO exceptions (such as connection timeouts) per request. If set to0, Spark will not do any retries. -
io.retryWait(default:5s) — the time (in milliseconds) that Spark will wait in order to perform a retry after anIOException. Only relevant ifio.maxRetries> 0. -
io.lazyFD(default:true) — controls whether to initializeFileDescriptorlazily (true) or not (false). Iftrue, file descriptors are created only when data is going to be transferred. This can reduce the number of open files.
General Network-Related Settings
spark.storage.memoryMapThreshold
spark.storage.memoryMapThreshold (default: 2m) is the minimum size of a block that we should start using memory map rather than reading in through normal IO operations.
This prevents Spark from memory mapping very small blocks. In general, memory mapping has high overhead for blocks close to or below the page size of the OS.