I am using createFolds()
in R (version: 3.3.0) to create train/test partitions. To make results reproducible, I used set.seed()
with a seed value of 10. As expected, the results (generated folds) were reproducible.
But once I loaded caret package just after setting the seed. And then used the createFolds function, I found that the created folds were different (although still reproducible).
Specifically, the created folds differ in the following two cases:
Case 1:
library(caret)
set.seed(10)
folds=createFolds(y,k=5,returnTrain=TRUE)
Case 2:
set.seed(10)
library(caret)
folds=createFolds(y,k=5,returnTrain=TRUE)
where y
is a vector.
Why could this be happening?
The culprit is ggplot2, which is attached when you load caret. It defines an .onAttach
function: https://github.com/hadley/ggplot2/blob/master/R/zzz.r
This function is called when the package is attached, see help("ns-hooks")
. And within it runif
is called thereby advancing the state of the RNG.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With