How can I unpersist RDD that were generated in an MLlib model for which I don't have a reference?
I know in pyspark you could unpersist all dataframes with sqlContext.clearCache()
, is there something similar but for RDDs in the scala API? Furthermore, is there a way I could unpersist only some RDDs without having to unpersist all?
Marks the DataFrame as non-persistent, and remove all blocks for it from memory and disk.
(transitive, computing) To remove from permanent storage; to make temporary again.
A call to gc. collect() also usually works. Almost. You should remove the last reference to it (i.e. del thisRDD ), and then, if you really need the RDD to be unpersisted immediately**, call gc.
There are two ways to create RDDs: parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system, such as a shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat.
You can call
val rdds = sparkContext.getPersistentRDDs(); // result is Map[Int, RDD]
and then filter values to get this value that you want (1) :
rdds.filter (x => filterLogic(x._2)).foreach (x => x._2.unpersist())
(1) - written by hand, without compiler - sorry if there's some error, but there shouldn't be ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With