CacheManager — In-Memory Cache for Tables and Views

CacheManager is an in-memory cache for tables and views (as logical plans). It uses the internal cachedData collection of CachedData to track logical plans and their cached InMemoryRelation representation.

CacheManager is shared across SparkSessions through SharedState.

sparkSession.sharedState.cacheManager
Note
A Spark developer can use CacheManager to cache Datasets using cache or persist operators.

Cached Queries — cachedData Internal Registry

cachedData is a collection of CachedData with logical plans and their cached InMemoryRelation representation.

cachedData is cleared when…​FIXME

invalidateCachedPath Method

Caution
FIXME

invalidateCache Method

Caution
FIXME

lookupCachedData Method

Caution
FIXME

uncacheQuery Method

Caution
FIXME

isEmpty Method

Caution
FIXME

Caching Dataset (by Registering Logical Plan as InMemoryRelation) — cacheQuery Method

cacheQuery(
  query: Dataset[_],
  tableName: Option[String] = None,
  storageLevel: StorageLevel = MEMORY_AND_DISK): Unit

Internally, cacheQuery registers logical plan of the input query in cachedData internal registry of cached queries.

While registering, cacheQuery creates a InMemoryRelation with the following properties:

If however the input query has already been cached, cacheQuery simply prints the following WARN message to the logs and exits:

WARN CacheManager: Asked to cache already cached data.
Note

cacheQuery is used when:

Removing All Cached Tables From In-Memory Cache — clearCache Method

clearCache(): Unit

clearCache acquires a write lock and unpersists RDD[CachedBatch]s of the queries in cachedData before removing them altogether.

Note
clearCache is executed when the CatalogImpl is requested to clearCache.

CachedData

Caution
FIXME

results matching ""

    No results matching ""