If you run code like:
length(unique(runif(10000000)))
length(unique(rnorm(10000000)))
you'll see that only about 99.8% of runif values are unique, but 100% of rnorm values are. I thought this might be because of the constrained range, but upping the range to (0, 100000) for runif doesn't change the result. Continuous distributions should have probability of repeats =0, and I know in floating-point precision that's not the case, but I'm curious why we don't see fairly close to the same number of repeats between the two.
This is due primarily to the properties of the default PRNG (the fact that runif
has a smaller range than rnorm
and therefore a smaller number of representable values may also have a similar effect at some point even if the RNG doesn't). It is discussed somewhat obliquely in ?Random
:
Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)
With the example:
sum(duplicated(runif(1e6))) # around 110 for default generator
## and we would expect about almost sure duplicates beyond about
qbirthday(1 - 1e-6, classes = 2e9) # 235,000
Changing to the Wichmann-Hill generator indeed reduces the chance of duplicates:
RNGkind("Wich")
sum(duplicated(runif(1e6)))
[1] 0
sum(duplicated(runif(1e8)))
[1] 0
The documentation for random number generations says:
Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)
By the birthday paradox you would expect to see repeated values in a set of more than roughly 2^16 values, and 10000000 > 2^16. I haven't found anything directly in the documentation about how many distinct values rnorm
will return, but it is presumably larger than 2^32. It is interesting to note that set.seed
has different parameters kind
which determines the uniform generator and normal.kind
which determines the normal generator, so the latter is not a simple transformation of the former.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With