I'm trying to replicate in R a bit of code someone else wrote in Stata, and have hit a wall trying to predict the behavior of their p-RNG.
Their code has this snippet:
set seed 123456
Unfortunately, it's a bit nebulous exactly the algorithm used by Stata. This question suggests it's a KISS algorithm, but didn't manage to replicate in the end (and some of the links there seem to be dead/outdated). And the manual from Stata for set seed
doesn't mention anything about algorithms. This question as well doesn't seem to have been completed.
Is it a fool's errand to try and replicate Stata's random numbers?
I don't know which version of Stata was used to create this.
Stata in fact has ten random-number functions: runiform() generates rectangularly (uniformly) distributed random number over [0,1). rbeta(a, b) generates beta-distribution beta(a, b) random numbers.
A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator. For a seed to be used in a pseudorandom number generator, it does not need to be random.
Python Random seed() Method The seed() method is used to initialize the random number generator. The random number generator needs a number to start with (a seed value), to be able to generate a random number. By default the random number generator uses the current system time.
set seed # specifies the initial value of the random-number seed used by the random-number functions, such as runiform() and rnormal(). set seed statecode resets the state of the random-number functions to the value specified, which is a state previously obtained from creturn value c(seed).
In short: Yes, it is a fool's errand.
Stata, being a proprietary software, hasn't released all of the details of its core components, like its random number generator. However, documentation is available (link for Stata 14), most pertinently:
runiform()
is the basis for all the other random-number functions because all the other random- number functions transform uniform (0, 1) random numbers to the specified distribution.
runiform()
implements the Mersenne Twister 64-bit (MT64) and the “keep it simple stupid” 32-bit (KISS32) algorithms for generating uniform (0, 1) random numbers.runiform()
uses the MT64 algorithm by default.
runiform()
uses the KISS32 algorithm only when the user version is less than 14 or when the random-number generator has been set tokiss32
...
Recall also from ?Random
in R that for Mersenne twister:
The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.
Stata internally controls the 624-dimensional set, which should be nearly impossible to guess.
I suggest you export these random numbers from Stata and read them into a vector/matrix/etc. in R using
library(haven)
mydata <- read_dta("mydata.dta")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With