Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

controlling seeds with mclapply

Imagine we are doing a number of processes where i want to set one overall seed at the beginning of a program: e.g.

mylist <- list( as.list(rep(NA,3)), as.list(rep(NA,3)) )
foo <- function(x){  for(i in 1:length(x)){ 
                       x[[i]] <- sample(100,1)
                         }
                      return(x) 
                     } 

# start block
set.seed(1)
l1 <- lapply(mylist, foo)
l2 <- lapply(mylist, foo)
# end

of course within a block l1 and l2 are different, but if i run the above block again l1 will be the same as before and l2 will be the same as before.

Imagine foo is horribly time consuming so i want to use mclapply not lapply, so i do:

library(parallel)

# start block
set.seed(1)
mclapply(mylist , foo,  mc.cores = 3)
mclapply(mylist , foo,  mc.cores = 3)
# end

If i run this block again I will get different results the next time. How do I produce the same behaviour as with setting one overall seed with lapply but using mclappy. I have looked through mclapply doc but I am not sure because using:

set.seed(1)
l1 <-  mclapply(mylist , foo,  mc.cores = 3, mc.set.seed=FALSE)
l2 <-  mclapply(mylist , foo,  mc.cores = 3, mc.set.seed=FALSE)

result in l1 and l2 being the same, which is not what I want...

like image 442
user1320502 Avatar asked May 26 '15 10:05

user1320502


1 Answers

The parallel package comes with special support for the "L'Ecuyer-CMRG" random number generator which was introduced at the same time as parallel. You can read the documentation for that support using:

library(parallel)
?mc.reset.stream

To use it, you first need to enable "L'Ecuyer-CMRG":

RNGkind("L'Ecuyer-CMRG")

After doing that, code such as:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
mclapply(mylist, foo, mc.cores=3)

will be reproducible, but the two calls to mclapply will return identical results. This is because the state of the random number generator in the master process isn't changed by calling mclapply.

I've used the following function to skip over the random number streams used by the mclapply workers:

skip.streams <- function(n) {
  x <- .Random.seed
  for (i in seq_len(n))
    x <- nextRNGStream(x)
  assign('.Random.seed', x, pos=.GlobalEnv)
}

You can use this function to get the behavior that I think you want:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)
like image 129
Steve Weston Avatar answered Sep 22 '22 09:09

Steve Weston