I run simulations on a Windows 64bit-computer with 64 GB RAM. Memory use reaches 55% and after a finished simulation run I remove all objects in the working space by rm(list=ls())
, followed by a double gc()
.
I supposed that this would free enough memory for the next simulation run, but actually memory usage drops by just 1%. Consulting a lot of different fora I could not find a satisfactory explanation, only vague comments such as:
"Depending on your operating system, the freed up memory might not be returned to the operating system, but kept in the process space."
I'd like to find information on:
The garbage collector allocates and frees virtual memory for you on the managed heap. If you're writing native code, you use Windows functions to work with the virtual address space. These functions allocate and free virtual memory for you on native heaps.
If the available free space in the heap is higher than the ratio specified using -XX:MaxHeapFreeRatio option, then GC can return the unused memory to the OS.
R uses an alternative approach: garbage collection (or GC for short). GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object.
rm() function in R Language is used to delete objects from the workspace. It can be used with ls() function to delete all objects. remove() function is also similar to rm() function.
The R
garbage collector is imperfect in the following (not so) subtle way: it does not move objects (i.e., it does not compact memory) because of the way it interacts with C
libraries. (Some other languages/implementations suffer from this too, but others, despite also having to interact with C
, manage to have a compacting generational GC which does not suffer from this problem).
This means that if you take turns allocating small chunks of memory which are then discarded and larger chunks for more permanent objects (this is a common situation when doing string/regexp processing), then your memory becomes fragmented and the garbage collector can do nothing about it: the memory is released, but cannot be re-used because the free chunks are too short.
The only way to fix the problem is to save the objects you want, restart R
, and reload the objects.
Since you are doing rm(list=ls())
, i.e., you do not need any objects, you do not need to save and reload anything, so, in your case, the solution is precisely what you want to avoid - restarting R
.
PS1. Garbage collection is a highly non-trivial topic. E.g., Ruby used 5 (!) different GC algorithms over 20 years. Java GC does not suck because Sun/Oracle and IBM spent many programmer-years on their respective implementations of the GC. On the other hand, R and Python have lousy GC - because no one bothered to invest the necessary man-years - and they are quite popular. That's worse-is-better for you.
PS2. Related: R: running out of memory using `strsplit`
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With