when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.
Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached
this doesn't change even if i kill/stop shark,spark, hadoop processes. Right now, the only way to clear the cache is to reboot the machine.
has anyone faced this issue before? is it some configuration problem or a known issue in spark/shark?
cache() just calls persist() , so to remove the cache for an RDD, call unpersist() .
Caching methods in SparkDISK_ONLY: Persist data on disk only in serialized format. MEMORY_ONLY: Persist data in memory only in deserialized format. MEMORY_AND_DISK: Persist data in memory and if enough memory is not available evicted blocks will be stored on disk. OFF_HEAP: Data is persisted in off-heap memory.
If you clear the cache on your Android phone periodically, you could help eliminate performance issues on the device. Your Android phone's cache comprises stores of small bits of information that your apps and web browser use to speed up performance.
To remove all cached data:
sqlContext.clearCache()
Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html
If you want to remove an specific Dataframe from cache:
df.unpersist()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With