Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove invisible objects from an R workspace that are not removed by garbage collection?

I am using the blackboost function from the mboost package to estimate a model on an approximately 500mb dataset on a Windows 7 64-bit, 8gb RAM machine. During the execution R uses up to virtually all available memory. After the calculation is done, over 4.5gb keeps allocated to R even after calling the garbage collection with gc() or saving and reloading the workspace to a new R session. Using .ls.objects (1358003) I found that the size of all visible objects is about 550mb.

The output of gc() tells me that the bulk of data is in vector cells, although I'm not sure what that means:

            used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   2856967  152.6    4418719  236.0   3933533  210.1
Vcells 526859527 4019.7  610311178 4656.4 558577920 4261.7

This is what I'm doing:

> memory.size()
[1] 1443.99
> model <- blackboost(formula, data = mydata[mydata$var == 1,c(dv,ivs)],tree_control=ctree_control(maxdepth = 4))

...a bunch of packages are loaded...

> memory.size()
[1] 4431.85
> print(object.size(model),units="Mb")
25.7 Mb
> memory.profile()
     NULL      symbol    pairlist     closure environment     promise    language 
        1       15895      826659       20395        4234       13694      248423 
  special     builtin        char     logical     integer      double     complex 
      174        1572     1197774       34286       84631       42071          28 
character         ...         any        list  expression    bytecode externalptr 
   228592           1           0       79877           1       51276        2182 
  weakref         raw          S4 
      413         417        4385 

mydata[mydata$var == 1,c(dv,ivs)] has 139593 rows and 75 columns with mostly factor variables and some logical or numerical variables. formula is a formula object of the type: "dv ~ var2 + var3 + .... + var73". dv is a variable name string and ivs is a string vector with all independent variables var2 ... var74.

Why is so much memory being allocated to R? How can I make R free up the extra memory? Any thoughts appreciated!

like image 346
Nima Avatar asked Nov 02 '12 13:11

Nima


People also ask

How do I remove an object from a workspace in R?

Actually, there are two different functions that can be used for clearing specific data objects from the R workspace: rm() and remove(). However, these two functions are exactly the same. You can use the function you prefer. The previous R code also clears the data object x from the R workspace.

What does GC () do in R?

R uses an alternative approach: garbage collection (or GC for short). GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object.

How to delete data on RStudio?

The console can be cleared using the shortcut key “ctrl + L“.

How do you remove all objects except one in R?

I think another option is to open workspace in RStudio and then change list to grid at the top right of the environment(image below). Then tick the objects you want to clear and finally click on clear. Likewise, click the Name box, which selects all the files, and then deselect all the files you want to keep.


1 Answers

I have talked to one of the package authors, who told me that much of the data associated with the model object is saved in environments, which explains why object.size does not reflect the complete memory usage of R induced by the blackboost function. He also told me that the mboost package was not optimized in terms of speed and memory efficiency but is aimed at flexibility, and that all trees are saved and thereby the data as well, which explains the large amounts of data generated (I still find the dimensions remarkable..). He recommended using the package gbm (which I couldn't get to replicate my results, yet) or to serialize, by doing something like this:

### first M_1 iterations
mod <- blackboost(...)[M_1]
f1 <- fitted(mod)
rm(mod)
### then M_2 additional iterations ...
mod <- blackboost(..., offset = f1)[M_2]
like image 150
Nima Avatar answered Oct 20 '22 00:10

Nima