Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make shark/spark clear the cache?

when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.


Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached


this doesn't change even if i kill/stop shark,spark, hadoop processes. Right now, the only way to clear the cache is to reboot the machine.

has anyone faced this issue before? is it some configuration problem or a known issue in spark/shark?

like image 588
venkat Avatar asked Dec 11 '13 11:12

venkat


People also ask

How do I clear spark cached data?

cache() just calls persist() , so to remove the cache for an RDD, call unpersist() .

How do I cache my spark?

Caching methods in SparkDISK_ONLY: Persist data on disk only in serialized format. MEMORY_ONLY: Persist data in memory only in deserialized format. MEMORY_AND_DISK: Persist data in memory and if enough memory is not available evicted blocks will be stored on disk. OFF_HEAP: Data is persisted in off-heap memory.

What does reset cache do?

If you clear the cache on your Android phone periodically, you could help eliminate performance issues on the device. Your Android phone's cache comprises stores of small bits of information that your apps and web browser use to speed up performance.


1 Answers

To remove all cached data:

sqlContext.clearCache() 

Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html

If you want to remove an specific Dataframe from cache:

df.unpersist() 
like image 51
Henrique Florencio Avatar answered Sep 19 '22 15:09

Henrique Florencio