Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there a way to release the memory when do parallel computation in R

suppose that I want to run the R program using multiple cores as the following

library(foreach)
library(doParallel)

no_cores <- detectCores() - 2

cl<-makeCluster(no_cores, outfile = "debug.txt")

registerDoParallel(cl)

result <- foreach(i = 10:100, 
        .combine = list,
        .multicombine = TRUE)  %dopar%  {

          set.seed(i)

          a <- replicate(i, rnorm(20)) 
          b <- replicate(i, rnorm(20))

          list(x = a + b, y = a - b)

        } 

However, I found the memory usage increased after the program run some time. I think the program do not release the old object. So I tried to use the gc() as

result <- foreach(i = 10:100, 
        .combine = list,
        .multicombine = TRUE)  %dopar%  {

          set.seed(i)

          a <- replicate(i, rnorm(20)) 
          b <- replicate(i, rnorm(20))

          list(x = a + b, y = a - b)
         gc()

        } 

it seems work, but I do not get the result I want. And then I tried to collect the garbage before each loop, but it seems do not work.

result <- foreach(i = 10:100, 
        .combine = list,
        .multicombine = TRUE)  %dopar%  {
          gc()
          set.seed(i)

          a <- replicate(i, rnorm(20)) 
          b <- replicate(i, rnorm(20))

          list(x = a + b, y = a - b)    
        } 

is there a way to solve this problem? Thank you guys, any suggestion will be appreciated. PS. This code is just for reproduce, and my real simulation program is much complex than this. So I do not want to change the program structure too much.

like image 456
Zihu Guo Avatar asked Apr 20 '17 00:04

Zihu Guo


1 Answers

I think you did not encounter any so-called "memory leak" because using more iteration of your foreach simply creates a bigger array. If your question is whether gc() is actually helpful I reccomend you to read the memory usage chapter of Advanced R by Hadley Wickham in which he states:

Despite what you might have read elsewhere, there’s never any need to call gc() yourself

Anyhow, I tried to figure which is the possible memory leak in your code dividing it into the three functions or possibilities you described.

library(foreach)
library(doParallel)  
f1 <- function(uu = 10:100){
  no_cores <- detectCores() - 2
  cl<-makeCluster(no_cores)
  registerDoParallel(cl)
  result1 <- foreach(i = uu, .combine = list, 
                     .multicombine = TRUE)  %dopar%  {
                      set.seed(i)
                      a <- replicate(i, rnorm(20)) 
                      b <- replicate(i, rnorm(20))
                      gc()
                      return(list(x = a + b, y = a - b))
                      } 
  stopCluster(cl)
  return(result1)
}
f2 <- function(uu = 10:100){
  no_cores <- detectCores() - 2
  cl<-makeCluster(no_cores)
  registerDoParallel(cl)
  result1 <- foreach(i = uu, .combine = list, 
                     .multicombine = TRUE)  %dopar%  {
                       gc()
                       set.seed(i)
                       a <- replicate(i, rnorm(20)) 
                       b <- replicate(i, rnorm(20))
                       return(list(x = a + b, y = a - b))
                     } 
  stopCluster(cl)
  return(result1)
}
f3 <- function(uu = 10:100){
  no_cores <- detectCores() - 2
  cl<-makeCluster(no_cores)
  registerDoParallel(cl)
  result1 <- foreach(i = uu, .combine = list,  .multicombine = TRUE)  %dopar%  {
                       set.seed(i)
                       a <- replicate(i, rnorm(20)) 
                       b <- replicate(i, rnorm(20))
                       return(list(x = a + b, y = a - b))
                     } 
  stopCluster(cl)
  return(result1)
}

library(pryr)
mem_used() # 214 MB
mem_change(NULL) # 864 B
gc() # whatever
mem_change({res1 <- f1(); rm(res1)}) # 2.11 kB
mem_change({res1 <- f2(); rm(res1)}) # 2.11 kB
mem_change({res1 <- f3(); rm(res1)}) # 2.11 kB
mem_change({res1 <- f1(10:250); rm(res1)}) # 2.11 kB
mem_change({res1 <- f2(10:250); rm(res1)}) # 2.11 kB
mem_change({res1 <- f3(10:250); rm(res1)}) # 2.11 kB

Beside that I tried to run profvis on the three functions with std inputs (10:100) and I got the following general time and memories:

f1():enter image description heref2():enter image description heref3():enter image description here

I would not trust the profvis results for memory but for time. In general, I would not use gc() for freeing up space in the parallel loops.

like image 184
Garini Avatar answered Nov 07 '22 21:11

Garini