Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: bizarre behavior of set.seed()

Tags:

random

r

seed

Odd thing happens when in R when I do set.seed(0) and set.seed(1);

set.seed(0)
sample(1:100,size=10,replace=TRUE)
#### [1] 90 27 38 58 91 21 90 95 67 63


set.seed(1)
sample(1:100,size=10,replace=TRUE)
#### [1] 27 38 58 91 21 90 95 67 63  7

When changing the seed from 0 to 1, I get the exact same sequence, but shifted over by 1 cell!

Note that if I do set.seed(2), I do get what appears to be a completely different (random?) vector.

set.seed(2)
sample(1:100,size=10,replace=TRUE)
#### [1] 19 71 58 17 95 95 13 84 47 55

Anyone know what's going on here?

like image 647
bigO6377 Avatar asked Feb 11 '14 20:02

bigO6377


People also ask

What does set seed do in R?

The set. seed() function sets the starting number used to generate a sequence of random numbers – it ensures that you get the same result if you start with that same seed each time you run the same process.

Do you only need to set seed once in R?

So the short answer to the question is: if you want to set a seed to create a reproducible process then do what you have done and set the seed once; however, you should not set the seed before every random draw because that will start the pseudo-random process again from the beginning.

Does it matter what number you use in set seed ()?

If I was writing the code in a script, would I want to use a specific number or any number is fine just as long as it is noted for use if I'd like to use same sequence again? Yes, that's correct. You can use any seed you want as long as it is an integer. We use seeds so we get the same "random" numbers each time.

Does set seed work with sample in R?

seed() function in R and why to use it ? : set. seed() function in R is used to reproduce results i.e. it produces the same sample again and again. When we generate randoms numbers without set. seed() function it will produce different samples at different time of execution.


2 Answers

This applies to the R implementation of the Mersenne-Twister RNG.

set.seed() takes the provided seed and scrambles it (in the C function RNG_Init):

for(j = 0; j < 50; j++)
  seed = (69069 * seed + 1);

That scrambled number (seed) is then scrambled 625 times to fill out the initial state for the Mersenne-Twister:

for(j = 0; j < RNG_Table[kind].n_seed; j++) {
  seed = (69069 * seed + 1);
  RNG_Table[kind].i_seed[j] = seed;
}

We can examine the initial state for the RNG using .Random.seed:

set.seed(0)
x <- .Random.seed

set.seed(1)
y <- .Random.seed

table(x %in% y)

You can see from the table that there is a lot of overlap. Compare this to seed = 3:

set.seed(3)
z <- .Random.seed

table(z %in% x)
table(z %in% y)

Going back to the case of 0 and 1, if we examine the state itself (ignoring the first two elements of the vector which do not apply to what we are looking at), you can see that the state is offset by one:

x[3:10]
# 1280795612 -169270483 -442010614 -603558397 -222347416 1489374793  865871222
# 1734802815

y[3:10] 
# -169270483 -442010614 -603558397 -222347416 1489374793  865871222 1734802815
# 98005428

Since the values selected by sample() are based on these numbers, you get the odd behavior.

like image 189
Christopher Louden Avatar answered Oct 08 '22 01:10

Christopher Louden


As you can see from the other answer, seeds 0 and 1 result in almost similar initial states. In addition, Mersenne Twister PRNG has a severe limitation - "almost similar initial states will take a long time to diverge"

It is therefore advisable to use alternatives like WELL PRNG (which can be found in randtoolbox package)

like image 25
Nishanth Avatar answered Oct 08 '22 02:10

Nishanth