Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R doesn't reset the seed when "L'Ecuyer-CMRG" RNG is used?

I was doing some parallel simulations in R and I notice that the seed is not changed when the "L'Ecuyer-CMRG" rng is used. I was reading the book "Parallel R", and the option mc.set.seed = TRUE should give each worker a new seed each time mclapply() is called.

Here is my code:

library(parallel)
RNGkind("L'Ecuyer-CMRG")

mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE)
[[1]]
[1] -0.7125037

[[2]]
[1] -0.9013552  0.3445190

mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE)
[[1]]
[1] -0.7125037

[[2]]
[1] -0.9013552  0.3445190

EDIT: same thing happens both on my desktop and on my laptop (both Ubuntu 12.04 LTS).

like image 856
Matteo Fasiolo Avatar asked Feb 25 '13 15:02

Matteo Fasiolo


2 Answers

It appears to me that if you want to guarantee that subsequent calls to mclapply in an R session get different random numbers, you need to either call set.seed with a different value, remove the global variable ".Random.seed", or generate at least one random number in that R session before calling mclapply again.

The reason for this behavior is that mclapply (unlike mcparallel for example) calls mc.reset.stream internally. This resets the seed that is stashed in the "parallel" package to the value of ".Random.seed", so if ".Random.seed" hasn't changed when mclapply is called again, the workers started by mclapply will get the same random numbers as they did previously.

Note that this is not the case with functions such as clusterApply and parLapply, since they use persistent workers, and therefore continue to draw random numbers from their RNG stream. But new workers are forked every time mclapply is called, presumably making it much harder to have that type of behavior.

Here's an example of setting the seed to different values in order to get different random numbers using mclapply:

RNGkind("L'Ecuyer-CMRG")
set.seed(100)
mclapply(1:2, function(i) rnorm(2))
set.seed(101)
mclapply(1:2, function(i) rnorm(2))

Here's an example of removing ".Random.seed":

RNGkind("L'Ecuyer-CMRG")
mclapply(1:2, function(i) rnorm(2))
rm(.Random.seed)
mclapply(1:2, function(i) rnorm(2))

And here's an example of generating random numbers on the master:

RNGkind("L'Ecuyer-CMRG")
mclapply(1:2, function(i) rnorm(2))
rnorm(1)
mclapply(1:2, function(i) rnorm(2))

I'm not sure which is the best approach, but that may depend on what you're trying to do.

Although it appears that simply calling mclapply multiple times without changing ".Random.seed" results in reproducible results, I don't know if that is guaranteed. To guarantee reproducible results, I think you need to call set.seed:

RNGkind("L'Ecuyer-CMRG")
set.seed(1234)
mclapply(1:2, function(i) rnorm(2))
set.seed(1234)
mclapply(1:2, function(i) rnorm(2))
like image 195
Steve Weston Avatar answered Sep 27 '22 19:09

Steve Weston


Have you considered using clusterApply instead of mclapply? I think it's easier to guarantee reproducible examples using this framework. I'm attaching an example.

library(parallel)

#---- creating local cluster ----

clust <- makeCluster(detectCores())

#---- seed ----

RNGkind(kind = "L'Ecuyer-CMRG")
set.seed(1234)
s <- .Random.seed
clusterSetRNGStream(cl = clust, iseed = s)

#---- generating random numbers ----

clusterApply(cl = clust, x = 1:2, fun = function(i) rnorm(2))
#> [[1]]
#> [1] -0.0514128  0.7344781
#> 
#> [[2]]
#> [1]  0.3946233 -0.6649782
clusterApply(cl = clust, x = 1:2, fun = function(i) rnorm(2))
#> [[1]]
#> [1] -1.57875323 -0.07533176
#> 
#> [[2]]
#> [1] -0.3698359 -0.1564795

#---- repeating the process ----

set.seed(1234)
s <- .Random.seed
clusterSetRNGStream(cl = clust, iseed = s)

clusterApply(cl = clust, x = 1:2, fun = function(i) rnorm(2))
#> [[1]]
#> [1] -0.0514128  0.7344781
#> 
#> [[2]]
#> [1]  0.3946233 -0.6649782
clusterApply(cl = clust, x = 1:2, fun = function(i) rnorm(2))
#> [[1]]
#> [1] -1.57875323 -0.07533176
#> 
#> [[2]]
#> [1] -0.3698359 -0.1564795

# same result

Created on 2019-03-04 by the reprex package (v0.2.1)

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Sao_Paulo           
#>  date     2019-03-04
#> Packages -----------------------------------------------------------------
#>  package   * version date       source         
#>  backports   1.1.2   2017-12-13 CRAN (R 3.5.0) 
#>  base      * 3.5.1   2018-07-03 local          
#>  compiler    3.5.1   2018-07-03 local          
#>  datasets  * 3.5.1   2018-07-03 local          
#>  devtools    1.13.6  2018-06-27 CRAN (R 3.5.0) 
#>  digest      0.6.17  2018-09-12 cran (@0.6.17) 
#>  evaluate    0.11    2018-07-17 cran (@0.11)   
#>  graphics  * 3.5.1   2018-07-03 local          
#>  grDevices * 3.5.1   2018-07-03 local          
#>  htmltools   0.3.6   2017-04-28 CRAN (R 3.5.0) 
#>  knitr       1.20    2018-02-20 CRAN (R 3.5.0) 
#>  magrittr    1.5     2014-11-22 cran (@1.5)    
#>  memoise     1.1.0   2017-04-21 CRAN (R 3.5.0) 
#>  methods   * 3.5.1   2018-07-03 local          
#>  parallel  * 3.5.1   2018-07-03 local          
#>  Rcpp        0.12.18 2018-07-23 cran (@0.12.18)
#>  rmarkdown   1.10    2018-06-11 CRAN (R 3.5.1) 
#>  rprojroot   1.3-2   2018-01-03 CRAN (R 3.5.0) 
#>  stats     * 3.5.1   2018-07-03 local          
#>  stringi     1.2.4   2018-07-20 cran (@1.2.4)  
#>  stringr     1.3.1   2018-05-10 CRAN (R 3.5.0) 
#>  tools       3.5.1   2018-07-03 local          
#>  utils     * 3.5.1   2018-07-03 local          
#>  withr       2.1.2   2018-03-15 CRAN (R 3.5.0) 
#>  yaml        2.2.0   2018-07-25 cran (@2.2.0)
like image 36
lcgodoy Avatar answered Sep 27 '22 17:09

lcgodoy