Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r garbage collection parameter explanation

Tags:

r

In the most recent release of R (3.5.0) there is a new parameter been added for garbage collection (full).

gc(verbose = getOption("verbose"), reset = FALSE, full = TRUE)

whatever has been done it appears to have solved my memory issues. My problem is though I don't understand the documentation as it is written for 'experts' but for mere mortals like me it is pretty hard to understand what it means.

can anyone explain in a bit more detail what the following parameter descriptions actually mean, and what difference it makes to me selecting True/False for each?

reset logical; if TRUE the values for maximum space used are reset to the current values.

full logical; if TRUE a full collection is performed; otherwise only more recently allocated objects may be collected.

The full documentation is at :

https://stat.ethz.ch/R-manual/R-devel/library/base/html/gc.html

Thanks in advance.

like image 758
user9774387 Avatar asked May 21 '18 09:05

user9774387


1 Answers

The memory R allocates for vectors is called Vcells (R allocates space for vectors in multiples of 8 bytes) and memory allocated for other objects - as Ncells (28 bytes each on 32-bit systems and 56 bytes on 64-bit systems). At the beginning of R session before any object has been created in R session gc() function outputs:

gc(verbose=TRUE, full=TRUE)
# Garbage collection 2 = 0+0+2 (level 2) ... 
#13.4 Mbytes of cons cells used (41%)
#3.7 Mbytes of vectors used (6%)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 249316 13.4     608371 32.5   407443 21.8
#Vcells 484528  3.7    8388608 64.0  1607721 12.3

Used column reports how many cells have been allocated and the following column reports the total size in Megabytes. gc trigger column reports when the next garbage collection will occur. The column max used reports the maximum memory that was used since the last call to the gc() function. So for example if we create a new object we will see the change in the output of gc():

x <- as.list(1:10000)
y <- 1:100000
gc(verbose=TRUE, full=TRUE)
#Garbage collection 3 = 0+0+3 (level 2) ... 
#13.9 Mbytes of cons cells used (43%)
#3.9 Mbytes of vectors used (6%)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 259262 13.9     608371 32.5   407443 21.8
#Vcells 504438  3.9    8388608 64.0  1607721 12.3

Create a large vector with floats and then delete it:

z <- seq(from=0.0, to=1.0, by=0.0000001)
gc(verbose=TRUE, full=TRUE)
#Garbage collection 6 = 0+0+6 (level 2) ... 
#14.0 Mbytes of cons cells used (43%)
#80.2 Mbytes of vectors used (39%)
#           used (Mb) gc trigger  (Mb) max used  (Mb)
#Ncells   261151 14.0     608371  32.5   407443  21.8
#Vcells 10508970 80.2   26703045 203.8 20512469 156.5

rm(z)
gc(verbose=TRUE, full=TRUE)
#Garbage collection 8 = 0+0+8 (level 2) ... 
#14.0 Mbytes of cons cells used (43%)
#3.9 Mbytes of vectors used (2%)
#         used (Mb) gc trigger  (Mb) max used  (Mb)
#Ncells 261391 14.0     608371  32.5   407443  21.8
#Vcells 509467  3.9   21362436 163.0 20512469 156.5

The garbage collector recovers the memory that is no longer in use. The above values from the table in "gc trigger" column specify when gc is triggered. If you call gc() function you can force garbage collector to return memory to the system.

One can change maxima that is set for Ncells or/and Vcells. See https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory.html This link provides in-depth information not only about memory usage in R but some insights into how garbage collector works in R.

As for your main question about full=TRUE option, according to R documentation this option should be used if you want a more accurate report about memory usage. It also forces to perform full garbage collection (and as a result might take a little more time). Otherwise R might preform only partial garbage collection of releasing the memory for only recently allocated objects.

By the way, as you can see from the latest version of R documentation (?memory ) the way memory management in R is performed very much depends on the OS you have. For example: On Windows the --max-mem-size option (or environment variable R_MAX_MEM_SIZE) sets the maximum (virtual) memory allocation: it has a minimum allowed value of 32M. This is intended to catch attempts to allocate excessive amounts of memory which may cause other processes to run out of resources. See also memory.limit.

Again, I would read very carefully the documentation for memory topic in R (?memory). Make sure you read the version corresponding to the R version you use on your system since there are some changes between versions.

There are 2 more links that are worth reading to get more information how garbage collector works in R: http://homepage.stat.uiowa.edu/~luke/R/barrier.html http://homepage.stat.uiowa.edu/~luke/R/gengcnotes.html

Since call to garbage collector is relatively expensive operation you do not want to include it into loops. Ideally, you remove objects you no longer need in your script using rm() function and let R decide when to perform garbage collection.

One more note: Rstudio starts R with its own settings. For example here is an output from regular R session:

gc(verbose=TRUE, full=TRUE)
#Garbage collection 2 = 0+0+2 (level 2) ... 
#13.4 Mbytes of cons cells used (41%)
#3.7 Mbytes of vectors used (6%)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 249316 13.4     608371 32.5   407443 21.8
#Vcells 484528  3.7    8388608 64.0  1607721 12.3

memory.limit()
#[1] 7888

memory.size()
#[1] 29.02

Here is an output from R session within Rstudio:

gc(verbose=TRUE, full=TRUE)
#Garbage collection 19 = 15+1+3 (level 2) ... 
#32.6 Mbytes of cons cells used (45%)
#9.3 Mbytes of vectors used (14%)
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  610199 32.6    1369865 73.2  1369865 73.2
#Vcells 1210415  9.3    8388608 64.0  1842725 14.1

memory.limit()
#[1] 7888

memory.size()
#[1] 99.49
like image 114
Katia Avatar answered Oct 09 '22 10:10

Katia