Sometimes I want to write a randomized function that always returns the same output for a particular input. I've always implemented this by setting the random seed at the top of the function and then proceeding. Consider two functions defined in this way:
sample.12 <- function(size) {
set.seed(144)
sample(1:2, size, replace=TRUE)
}
rand.prod <- function(x) {
set.seed(144)
runif(length(x)) * x
}
sample.12
returns a vector of the specified size randomly sampled from the set {1, 2}
and rand.prod
multiplies each element of a specified vector by a random value uniformly selected from [0, 1]
. Normally I would expect x <- sample.12(10000) ; rand.prod(x)
to have a "step" distribution with pdf 3/4 in the range [0, 1]
and 1/4 in the range [1, 2]
, but due to my unfortunate choice of identical random seeds above I see a different result:
x <- sample.12(10000)
hist(rand.prod(x))
I can fix this issue in this case by changing the random seed in one of the functions to some other value. For instance, with set.seed(10000)
in rand.prod
I get the expected distribution:
Previously on SO this solution of using different seeds has been accepted as the best approach to generate independent random number streams. However, I find the solution to be unsatisfying because streams with different seeds could be related to one another (possibly even highly related to one another); in fact, they might even yield identical streams according to ?set.seed
:
There is no guarantee that different values of seed will seed the RNG differently, although any exceptions would be extremely rare.
Is there a way to implement a pair of randomized functions in R that:
Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator.
Step 1: Type “=RANDBETWEEN(a,b)” into an empty cell, where a,b is the range you want to pick numbers from. For example, =RANDBETWEEN(1,100) will generate a random number between 1 and 100. Step 2: Press “ENTER.” This generates a random number between the number you specified.
You can specify the random seed by using the SEED= option in the PROC GMM statement. The seed value is used in the random number generator, which initializes the cluster assignment to start the iterations of the variational Bayesian algorithm.
Syntax: set.seed(123) In the above line,123 is set as the random number value. The main point of using the seed is to be able to reproduce a particular sequence of 'random' numbers. and sed(n) reproduces random numbers results by seed. For more information for set.
I've dug into this some more and it looks like the rlecuyer
package provides independent random streams:
Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L'Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.
The first step is global initialization of the independent streams:
library(rlecuyer)
.lec.CreateStream(c("stream.12", "stream.prod"))
Then each function needs to be modified to reset the appropriate stream to its beginning state (.lec.RestartStartStream
), set the R random number generator to the appropriate stream (.lec.CurrentStream
), and afterward set the R random number generator back to its state before the function was called (.lec.CurrentStreamEnd
).
sample.12 <- function(size) {
.lec.ResetStartStream("stream.12")
.lec.CurrentStream("stream.12")
x <- sample(1:2, size, replace=TRUE)
.lec.CurrentStreamEnd()
x
}
rand.prod <- function(x) {
.lec.ResetStartStream("stream.prod")
.lec.CurrentStream("stream.prod")
y <- runif(length(x)) * x
.lec.CurrentStreamEnd()
y
}
This satisfies the "always returns the same output given the same input" requirement:
all.equal(rand.prod(sample.12(10000)), rand.prod(sample.12(10000)))
# [1] TRUE
The streams also appears to operate independently in our example:
x <- sample.12(10000)
hist(rand.prod(x))
Note that this would not give consistent values across runs of our script because each call to .lec.CreateStream
would give a different initial state. To address this, we could note the initial state for each stream:
.lec.GetState("stream.12")
# [1] 3161578179 1307260052 2724279262 1101690876 1009565594 836476762
.lec.GetState("stream.prod")
# [1] 596094074 2279636413 3050913596 1739649456 2368706608 3058697049
We can then change the stream initialization at the beginning of the script to:
library(rlecuyer)
.lec.CreateStream(c("stream.12", "stream.prod"))
.lec.SetSeed("stream.12", c(3161578179, 1307260052, 2724279262, 1101690876, 1009565594, 836476762))
.lec.SetSeed("stream.prod", c(596094074, 2279636413, 3050913596, 1739649456, 2368706608, 3058697049))
Now calls to sample.12
and rand.prod
will match across calls to the script.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With