I'm handling some large datasets and am doing what I can to stay under R's memory limits. One question came up regarding the overwriting of R objects. I have a large data.table
(or any R object), and it has to be copied to tmp
multiple times. The question is: does it make any difference if I delete tmp
before overwriting it? In code:
for (1:lots_of_times) {
v_l_d_t_tmp <- copy(very_large_data_table) # Necessary copy of 7GB data
# table on 16GB machine. I can
# afford 2 but not 3 copies.
### do stuff to v_l_d_t_tmp and output
rm (v_l_d_t_tmp) # The question is whether this rm keeps max memory
# usage lower, or if it is equivalent to what an
# overwrite will automatically do on the next iteration.
}
Assume the copy is necessary (If I reach a point where I need to read very_large_data_table
from disk at each loop, I'll do that, but the question stands: will it make any difference on max memory usage if I explicitly delete v_l_d_t_tmp
before loading into it again?).
Or, to teach the man to fish, what could I have typed (within R, let's not get into ps
) to answer this myself?
It's totally OK if the answer turns out to be: "Trust garbage collection."
R uses an alternative approach: garbage collection (or GC for short). GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object.
So the gist of the matter is that R has been improving performance and memory management for a very long time.
Using gc() function to remove all objects that are used from memory: gc() is used to remove all objects that are used from memory. reset is an optional parameter. It will return the maximum memory used in Mb.
Generally speaking, R stores and manipulates all objects in the physical memory of your computer (i.e. the RAM). Therefore, it's important to be aware of the limits of your computing environment with respect to available memory and how that may affect your ability to use R.
Here's another idea... it doesn't directly answer your question, instead tries to get around it by eliminating the memory problem in another way. Might get you thinking:
What if you instead cache the very_large_data_table, and then read it in just once, do what you need to do, and then exit R. Now, write a loop outside of R, and the memory problem vanishes. Granted, this costs you more CPU because you have to read in 7GB multiple times... but it might be worth saving the memory costs. In fact, this halves your memory use, since you don't have to ever copy the table.
In addition, like @konvas pointed out in the comments, I too found that rm()
even with gc()
never got me what I needed with a long loop, memory would just accumulate and eventually bog down. Exiting R is the easy way out.
I had to do this so often that I wrote a package to help me cache objects like this: simpleCache
if you're interested in trying, it would look something like this:
do this outside of R:
for (1:lots_of_times) {
Rscript my_script.R
}
Then in R, do this... my_script.R:
library(simpleCache)
simpleCache("very_large_data_table", {r code for how
you make this table }, assignTo="v_l_d_t_tmp")
### do stuff to v_l_d_t_tmp and output
This is a comment more than an answer, but it is becoming too long.
I guess that in this case a call to rm
might be proper. I think that starting from the second iteration, you may have 3 tables in memory if you don't call rm
. While copying the large object, R cannot free the memory occupied by v_l_d_t_tmp
before the end of the copy, since the function call may have an error and in this case the old object must be preserved. Consider this example:
x<-1:10
myfunc<-function(y) {Sys.sleep(3);30}
Here I defined an object and a function that takes some time to do something. If you try:
x<-myfunc()
and break the execution before it ends "naturally", the object x
still exists, with its 1:10
content. So, I guess that in your case, even if you use the same symbol, R cannot free its content before or during the copy. It can if you remove it before the following copy. Of course, the object will be removed after the copy, but you may run out of memory during it.
I'm not by any means an expert of the R internals, so don't take for granted what I just said.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With