Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forcing garbage collection to run in R with the gc() command

Periodically I program sloppily. Ok, I program sloppily all the time, but sometimes that catches up with me in the form of out of memory errors. I start exercising a little discipline in deleting objects with the rm() command and things get better. I see mixed messages online about whether I should explicitly call gc() after deleting large data objects. Some say that before R returns a memory error it will run gc() while others say that manually forcing gc is a good idea.

Should I run gc() after deleting large objects in order to ensure maximum memory availability?

like image 289
JD Long Avatar asked Sep 23 '09 16:09

JD Long


People also ask

What does GC () do in R?

R uses an alternative approach: garbage collection (or GC for short). GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object.

What is garbage collection can it be forced to run?

Garbage collection is an automatic process and can't be forced. There is no guarantee that Garbage collection will start immediately upon request of System.

What is garbage collection GC )? How GC Works?

Garbage collection (GC) is a memory recovery feature built into programming languages such as C# and Java. A GC-enabled programming language includes one or more garbage collectors (GC engines) that automatically free up memory space that has been allocated to objects no longer needed by the program.


2 Answers

"Probably." I do it too, and often even in a loop as in

cleanMem <- function(n=10) { for (i in 1:n) gc() } 

Yet that does not, in my experience, restore memory to a pristine state.

So what I usually do is to keep the tasks at hand in script files and execute those using the 'r' frontend (on Unix, and from the 'littler' package). Rscript is an alternative on that other OS.

That workflow happens to agree with

  • workflow-for-statistical-analysis-and-report-writing
  • tricks-to-manage-the-available-memory-in-an-r-session

which we covered here before.

like image 175
Dirk Eddelbuettel Avatar answered Sep 22 '22 09:09

Dirk Eddelbuettel


From the help page on gc:

A call of 'gc' causes a garbage collection to take place. This will also take place automatically without user intervention, and the primary purpose of calling 'gc' is for the report on memory usage.

However, it can be useful to call 'gc' after a large object has been removed, as this may prompt R to return memory to the operating system.

So it can be useful to do, but mostly you shouldn't have to. My personal opinion is that it is code of last resort - you shouldn't be littering your code with gc() statements as a matter of course, but if your machine keeps falling over, and you've tried everything else, then it might be helpful.

By everything else, I mean things like

  1. Writing functions rather than raw scripts, so variables go out of scope.

  2. Emptying your workspace if you go from one problem to another unrelated one.

  3. Discarding data/variables that you aren't interested in. (I frequently receive spreadsheets with dozens of uninteresting columns.)

like image 28
Richie Cotton Avatar answered Sep 21 '22 09:09

Richie Cotton