Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R memory management - increasing memory consumption

My code looks as follows (it's a little bit simplified version compared to the orginal, but it still reflects the problem).

require(VGAM)

Median.sum  = vector(mode="numeric", length=75) 
AA.sum      = vector(mode="numeric", length=75)                                                    
BB.sum      = vector(mode="numeric", length=75)                   
Median      = array(0, dim=c(75 ,3)) 
AA          = array(0, dim=c(75 ,3))                                                    
BB          = array(0, dim=c(75 ,3))                              

y.sum     = vector(mode="numeric", length=100000)
y         = array(0, dim=c(100000,3))
b.size    = vector(mode="numeric", length=3) 
c.size    = vector(mode="numeric", length=3) 


for (h in 1:40)
{
  for (j in 1:75)
  {  
    for (i in 1:100000)
    {
      y.sum[i] = 0

      for (f in 1:3)
      {
        b.size[f] = rbinom(1, 30, 0.9)
        c.size[f] = 30 - rbinom(1, 30, 0.9) + 1
        y[i, f] = sum( rlnorm(b.size[f], 8.5, 1.9) ) + 
          sum( rgpd(c.size[f], 120000, 1870000, 0.158) )
        y.sum[i] = y.sum[i] + y[i, f]
      }
    }

    Median.sum[j] = median(y.sum)
    AA.sum[j] = mean(y.sum)
    BB.sum[j] = quantile(y.sum, probs=0.85)

    for (f in 1:3)
    {
      Median[j,f] = median(y[,f])
      AA[j,f] = mean(y[,f])
      BB[j,f] = quantile(y[,f], probs=0.85)
    }
  }
  #gc()
}

It breaks in the middle of it's execution (h=7, j=1, i=93065) with an error:

Error: cannot allocate vector of size 526.2 Mb

Just after getting this message I've read this, this & this, but it's still not enough. The thing is, that neither garbage collector (gc()), nor clearing all the objects from the workspace helps. I mean that I've tried to put in my code both: garbage collector and operation removing all the variabes and declaring them once again within the loop (take a look at the place where #gc() is - however the latter is not included in the code I've posted).

It seems strange to me as all the procedure uses the same objects in each step of the loop (=> and should consume the same volume of memory within each step of the loop). Why the memory consumption increases over time?

To make the matter worst, if I want to work in the same session of R and even perform:

rm(list=ls())
gc()

I still get the same error message, even if I want to declare something minor like:

abc = array(0, dim=c(10,3))

Only closing R and starting new session helps. Why? Maybe there is some way to recode my loop?

R: 2.15.1 (32-bit), OS: Windows XP (32-bit)

I am quite new here so every tip appreciated! Thanks in advance.


Edit: (From Arun). I find this behaviour even easier to reproduce just with a simple example. Start a new R session and copy and paste this code and watch the memory grow in your system monitor.

mm <- rep(0, 1e4) # initialise a vector
for (i in 1:1e3) {
    for (j in 1:1e3) {
        for (k in 1:1e4) {
            mm[k] <- k # already pre-allocated
         }
    }
}
like image 570
brunner Avatar asked Mar 24 '13 11:03

brunner


2 Answers

Add a call to gc() within the for (i in 1:100000) loop.

Adding a call to gc() within the tight loop of Arun's code removes its memory growth.

This shows memory growth:

mm <- rep(0, 1e4) # initialise a vector
for (i in 1:1e3) {
    for (j in 1:1e3) {
        for (k in 1:1e4) {
            mm[k] <- k # already pre-allocated
         }
     }
 }

This does not:

mm <- rep(0, 1e4) # initialise a vector
for (i in 1:1e3) {
    for (j in 1:1e3) {
        for (k in 1:1e4) {
            mm[k] <- k # already pre-allocated
            gc()
         }
     }
 }

Something is awry with the automatic garbage collection here. The collector is being called in the first case, as gcinfo(TRUE) indicates. But yet the memory grows very quickly.

like image 136
Matthew Lundberg Avatar answered Nov 07 '22 05:11

Matthew Lundberg


This seems to work (putting innermost loop into a function). I did not run it till the end because it was to slow, but I did not notice memory inflation like in your code.

require(VGAM)

Median.sum  = vector(mode="numeric", length=75) 
AA.sum      = vector(mode="numeric", length=75)                                                    
BB.sum      = vector(mode="numeric", length=75)                   
Median      = array(0, dim=c(75 ,3)) 
AA          = array(0, dim=c(75 ,3))                                                    
BB          = array(0, dim=c(75 ,3))                              


inner.fun <- function() {
  y.sum     = vector(mode="numeric", length=100000)
  y         = array(0, dim=c(100000,3))
  b.size    = vector(mode="numeric", length=3) 
  c.size    = vector(mode="numeric", length=3) 
  for (i in 1:100000)
    {
      y.sum[i] = 0

      for (f in 1:3)
      {
        b.size[f] = rbinom(1, 30, 0.9)
        c.size[f] = 30 - rbinom(1, 30, 0.9) + 1
        y[i, f] = sum( rlnorm(b.size[f], 8.5, 1.9) ) + 
          sum( rgpd(c.size[f], 120000, 1870000, 0.158) )
        y.sum[i] = y.sum[i] + y[i, f]
      }
    }
    list(y.sum, y)
}

for (h in 1:40)
{
  cat("\nh =", h,"; j = ")
  for (j in 1:75)
  {  
    cat(j," ")
    result = inner.fun()
    y.sum = result[[1]]
    y = result[[2]]
    Median.sum[j] = median(y.sum)
    AA.sum[j] = mean(y.sum)
    BB.sum[j] = quantile(y.sum, probs=0.85)

    for (f in 1:3)
    {
      Median[j,f] = median(y[,f])
      AA[j,f] = mean(y[,f])
      BB[j,f] = quantile(y[,f], probs=0.85)
    }
  }
}
like image 36
Bogumil Kaminski Avatar answered Nov 07 '22 03:11

Bogumil Kaminski