I am working on a job in which a temporary Hash table is repeatedly used through a loop. The Hash table is represented by an environment variable in R. The problem is that as the loop proceeds the memory cost keeps rising no matter what method I used to delete the table (I tried rm()
and gc()
however neither was able to free the memory.) As a consequence I cannot accomplish an extraordinary long loop, say 10M cycles. It looks like a memory leak problem but I fail to find a solution elsewhere. I would like to ask what is the correct way to completely removing an environment variable and simultaneously releasing all memory it previously occupied. Thanks in advance for helping check the problem for me.
Here is a very simple example. I am using Windows 8 and R version 3.1.0.
> fun = function(){
H = new.env()
for(i in rnorm(100000)){
H[[as.character(i)]] = rnorm(100)
}
rm(list=names(H), envir=H, inherits=FALSE)
rm(H)
gc()
}
>
> for(k in 1:5){
print(k)
fun()
gc()
print(memory.size(F))
}
[1] 1
[1] 40.43
[1] 2
[1] 65.34
[1] 3
[1] 82.56
[1] 4
[1] 100.22
[1] 5
[1] 120.36
Environments in R are not a good choice for situations where the keys can vary a lot during the computation. The reason is that environments require keys to be symbols, and symbols are not garbage collected. So each run of your function is adding to the internal symbol table. Arranging for symbols to be garbage collected would be one possibility, though care would be needed since a lot of internals code assumes they are not. Another option would be to create better hash table support so environments don't have to try to serve this purpose for which they were not originally designed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With