I used cache()
to cache the data in memory but I realized to see the performance without cached data I need to uncache it to remove data from memory:
rdd.cache();
//doing some computation
...
rdd.uncache()
but I got the error said:
value uncache is not a member of org.apache.spark.rdd.RDD[(Int, Array[Float])]
I don't know how to do the uncache then!
Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD. unpersist() method.
Caching RDDs in Spark There are two function calls for caching an RDD: cache() and persist(level: StorageLevel). The difference among them is that cache() will cache the RDD into memory, whereas persist(level) can cache in memory, on disk, or off-heap memory according to the caching strategy specified by level.
By default, each transformed RDD may be recomputed each time you run an action on it. However, you may also persist an RDD in memory using the persist (or cache) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it.
You can't clear cache in Spark 😔 You can reinstall the app.
RDD can be uncached using unpersist()
rdd.unpersist()
source
The uncache function doesn't exist. I think that you were looking for unpersist. Which according to the Spark ScalaDoc mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
If you want to remove all the cached RDDs, use this ::
for ((k,v) <- sc.getPersistentRDDs) {
v.unpersist()
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With