I am using the blackboost function from the mboost package to estimate a model on an approximately 500mb dataset on a Windows 7 64-bit, 8gb RAM machine. During the execution R uses up to virtually all available memory. After the calculation is done, over 4.5gb keeps allocated to R even after calling the garbage collection with gc() or saving and reloading the workspace to a new R session. Using .ls.objects (1358003) I found that the size of all visible objects is about 550mb.
The output of gc() tells me that the bulk of data is in vector cells, although I'm not sure what that means:
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2856967 152.6 4418719 236.0 3933533 210.1
Vcells 526859527 4019.7 610311178 4656.4 558577920 4261.7
This is what I'm doing:
> memory.size()
[1] 1443.99
> model <- blackboost(formula, data = mydata[mydata$var == 1,c(dv,ivs)],tree_control=ctree_control(maxdepth = 4))
...a bunch of packages are loaded...
> memory.size()
[1] 4431.85
> print(object.size(model),units="Mb")
25.7 Mb
> memory.profile()
NULL symbol pairlist closure environment promise language
1 15895 826659 20395 4234 13694 248423
special builtin char logical integer double complex
174 1572 1197774 34286 84631 42071 28
character ... any list expression bytecode externalptr
228592 1 0 79877 1 51276 2182
weakref raw S4
413 417 4385
mydata[mydata$var == 1,c(dv,ivs)] has 139593 rows and 75 columns with mostly factor variables and some logical or numerical variables. formula is a formula object of the type: "dv ~ var2 + var3 + .... + var73". dv is a variable name string and ivs is a string vector with all independent variables var2 ... var74.
Why is so much memory being allocated to R? How can I make R free up the extra memory? Any thoughts appreciated!
Actually, there are two different functions that can be used for clearing specific data objects from the R workspace: rm() and remove(). However, these two functions are exactly the same. You can use the function you prefer. The previous R code also clears the data object x from the R workspace.
R uses an alternative approach: garbage collection (or GC for short). GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object.
The console can be cleared using the shortcut key “ctrl + L“.
I think another option is to open workspace in RStudio and then change list to grid at the top right of the environment(image below). Then tick the objects you want to clear and finally click on clear. Likewise, click the Name box, which selects all the files, and then deselect all the files you want to keep.
I have talked to one of the package authors, who told me that much of the data associated with the model object is saved in environments, which explains why object.size does not reflect the complete memory usage of R induced by the blackboost function. He also told me that the mboost package was not optimized in terms of speed and memory efficiency but is aimed at flexibility, and that all trees are saved and thereby the data as well, which explains the large amounts of data generated (I still find the dimensions remarkable..). He recommended using the package gbm (which I couldn't get to replicate my results, yet) or to serialize, by doing something like this:
### first M_1 iterations
mod <- blackboost(...)[M_1]
f1 <- fitted(mod)
rm(mod)
### then M_2 additional iterations ...
mod <- blackboost(..., offset = f1)[M_2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With