StorageLevel

StorageLevel describes how an RDD is persisted (and addresses the following concerns):

  • Does RDD use disk?

  • How much of RDD is in memory?

  • Does RDD use off-heap memory?

  • Should an RDD be serialized (while persisting)?

  • How many replicas (default: 1) to use (can only be less than 40)?

There are the following StorageLevel (number _2 in the name denotes 2 replicas):

  • NONE (default)

  • DISK_ONLY

  • DISK_ONLY_2

  • MEMORY_ONLY (default for cache operation for RDDs)

  • MEMORY_ONLY_2

  • MEMORY_ONLY_SER

  • MEMORY_ONLY_SER_2

  • MEMORY_AND_DISK

  • MEMORY_AND_DISK_2

  • MEMORY_AND_DISK_SER

  • MEMORY_AND_DISK_SER_2

  • OFF_HEAP

You can check out the storage level using getStorageLevel() operation.

val lines = sc.textFile("README.md")

scala> lines.getStorageLevel
res0: org.apache.spark.storage.StorageLevel = StorageLevel(disk=false, memory=false, offheap=false, deserialized=false, replication=1)

results matching ""

    No results matching ""