User-Friendly Names Of Cached Queries in web UI’s Storage Tab

As you may have noticed, web UI’s Storage tab displays some cached queries with user-friendly RDD names (e.g. "In-memory table [name]") while others not (e.g. "Scan JDBCRelation…​").

spark sql caching webui storage.png
Figure 1. Cached Queries in web UI (Storage Tab)

"In-memory table [name]" RDD names are the result of SQL’s CACHE TABLE or when Catalog is requested to cache a table.

// register Dataset as temporary view (table)
spark.range(1).createOrReplaceTempView("one")
// caching is lazy and won't happen until an action is executed
val one = spark.table("one").cache
// The following gives "*Range (0, 1, step=1, splits=8)"
// WHY?!
one.show

scala> spark.catalog.isCached("one")
res0: Boolean = true

one.unpersist

import org.apache.spark.storage.StorageLevel
// caching is lazy
spark.catalog.cacheTable("one", StorageLevel.MEMORY_ONLY)
// The following gives "In-memory table one"
one.show

spark.range(100).createOrReplaceTempView("hundred")
// SQL's CACHE TABLE is eager
// The following gives "In-memory table `hundred`"
// WHY single quotes?
spark.sql("CACHE TABLE hundred")

// register Dataset under name
val ds = spark.range(20)
spark.sharedState.cacheManager.cacheQuery(ds, Some("twenty"))
// trigger an action
ds.head

The other RDD names are due to caching a Dataset.

val ten = spark.range(10).cache
ten.head

results matching ""

    No results matching ""