Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the seed chosen if not set by the user?

For the purpose of reproducibility, one has to choose a seed. In R, we can use set.seed(). My question is, when the seed is not set explicitly, how does the computer choose the seed? Why is there no default seed?

like image 312
John Smith Avatar asked Aug 25 '18 12:08

John Smith


People also ask

How do I choose a set seed number?

It's just down to the authors' choice. Further, if you are only ever setting the seed once in your code, then you can kind of choose any number you like.

Does it matter what you set seed to in R?

The use of set. seed is to make sure that we get the same results for randomization. If we randomly select some observations for any task in R or in any statistical software it results in different values all the time and this happens because of randomization.

How do you seed a random number generator?

The seed() method is used to initialize the random number generator. The random number generator needs a number to start with (a seed value), to be able to generate a random number. By default the random number generator uses the current system time.

Do I only need to set seed once?

So the short answer to the question is: if you want to set a seed to create a reproducible process then do what you have done and set the seed once; however, you should not set the seed before every random draw because that will start the pseudo-random process again from the beginning.


1 Answers

A pseudo random number generator (PRNG) needs a default start value, which you can set with set.seed(). If there is no given it generally takes computer based information. This could be time, cpu temperatur or something similar. If you want a more random start value it is possible to use physical values, like white noise or nuclear decay, but you generally need an extern information source for this kind of random information.

The documentation mentions R uses current time and the process ID:

Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.

A default seed is a bad idea, since a random generators would always produce the same samples of numbers by default. If you always take the same seed it's not anymore randomized, since there will be always the same numbers. So you just provide a fixed data sample, which is not the intended output of a PRNG. You could of course turn the default seed off (if there would be one), but the intended function is primary to generate a completely random set of data and not a fixed one.

For statistical approaches it matters for validation and verification reasons, but it's getting more important when you get to cryptography. In this field a good PRNG is mandatory.

like image 171
mischva11 Avatar answered Oct 03 '22 17:10

mischva11