Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When will Spark clean the cached RDDs automatically?

The RDD, which have been cached used the rdd.cache() method from the scala terminal, are being stored in the memory.

That means it will consume some part of the ram being available for the Spark process itself.

Having said that if the ram is being limited, and more and more RDDs have been cached, when will spark clean the memory automatically which has been occupied by the rdd cache?

like image 703
KayV Avatar asked Oct 21 '25 16:10

KayV


1 Answers

Spark will clean cached RDDs and Datasets / DataFrames:

  • When it is explicitly asked to by calling RDD.unpersist (How to uncache RDD?) / Dataset.unpersist methods or Catalog.clearCache.
  • In regular intervals, by the cache cleaner:

    Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.

  • When corresponding distributed data structure is garbage collected.

like image 149
user9068240 Avatar answered Oct 26 '25 19:10

user9068240