Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: set.seed() results don't match if caret package loaded

I am using createFolds() in R (version: 3.3.0) to create train/test partitions. To make results reproducible, I used set.seed() with a seed value of 10. As expected, the results (generated folds) were reproducible.

But once I loaded caret package just after setting the seed. And then used the createFolds function, I found that the created folds were different (although still reproducible).

Specifically, the created folds differ in the following two cases:

Case 1:

library(caret)
set.seed(10)
folds=createFolds(y,k=5,returnTrain=TRUE)

Case 2:

set.seed(10)
library(caret)
folds=createFolds(y,k=5,returnTrain=TRUE)

where y is a vector.

Why could this be happening?

like image 505
Inderdeep Singh Avatar asked Jul 19 '16 17:07

Inderdeep Singh


1 Answers

The culprit is ggplot2, which is attached when you load caret. It defines an .onAttach function: https://github.com/hadley/ggplot2/blob/master/R/zzz.r

This function is called when the package is attached, see help("ns-hooks"). And within it runif is called thereby advancing the state of the RNG.

like image 125
Roland Avatar answered Oct 03 '22 09:10

Roland