Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove a temporary environment variable and release memory in R

Tags:

memory

r

I am working on a job in which a temporary Hash table is repeatedly used through a loop. The Hash table is represented by an environment variable in R. The problem is that as the loop proceeds the memory cost keeps rising no matter what method I used to delete the table (I tried rm() and gc() however neither was able to free the memory.) As a consequence I cannot accomplish an extraordinary long loop, say 10M cycles. It looks like a memory leak problem but I fail to find a solution elsewhere. I would like to ask what is the correct way to completely removing an environment variable and simultaneously releasing all memory it previously occupied. Thanks in advance for helping check the problem for me.

Here is a very simple example. I am using Windows 8 and R version 3.1.0.

> fun = function(){
    H = new.env()
    for(i in rnorm(100000)){
      H[[as.character(i)]] = rnorm(100)
    }
    rm(list=names(H), envir=H, inherits=FALSE)
    rm(H)
    gc()
  }
> 
> for(k in 1:5){
    print(k)
    fun()
    gc()
    print(memory.size(F))
  }
[1] 1
[1] 40.43
[1] 2
[1] 65.34
[1] 3
[1] 82.56
[1] 4
[1] 100.22
[1] 5
[1] 120.36
like image 235
Jiexing Wu Avatar asked Aug 09 '15 18:08

Jiexing Wu


1 Answers

Environments in R are not a good choice for situations where the keys can vary a lot during the computation. The reason is that environments require keys to be symbols, and symbols are not garbage collected. So each run of your function is adding to the internal symbol table. Arranging for symbols to be garbage collected would be one possibility, though care would be needed since a lot of internals code assumes they are not. Another option would be to create better hash table support so environments don't have to try to serve this purpose for which they were not originally designed.

like image 190
Luke Tierney Avatar answered Oct 13 '22 10:10

Luke Tierney