Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does runif() have less unique values than rnorm()?

Tags:

random

r

If you run code like:

length(unique(runif(10000000)))
length(unique(rnorm(10000000)))

you'll see that only about 99.8% of runif values are unique, but 100% of rnorm values are. I thought this might be because of the constrained range, but upping the range to (0, 100000) for runif doesn't change the result. Continuous distributions should have probability of repeats =0, and I know in floating-point precision that's not the case, but I'm curious why we don't see fairly close to the same number of repeats between the two.

like image 631
jntrcs Avatar asked Jul 19 '18 14:07

jntrcs


2 Answers

This is due primarily to the properties of the default PRNG (the fact that runif has a smaller range than rnorm and therefore a smaller number of representable values may also have a similar effect at some point even if the RNG doesn't). It is discussed somewhat obliquely in ?Random:

Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)

With the example:

sum(duplicated(runif(1e6))) # around 110 for default generator
## and we would expect about almost sure duplicates beyond about
qbirthday(1 - 1e-6, classes = 2e9) # 235,000

Changing to the Wichmann-Hill generator indeed reduces the chance of duplicates:

RNGkind("Wich")  
sum(duplicated(runif(1e6)))
[1] 0
sum(duplicated(runif(1e8)))
[1] 0
like image 185
James Avatar answered Oct 05 '22 22:10

James


The documentation for random number generations says:

Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)

By the birthday paradox you would expect to see repeated values in a set of more than roughly 2^16 values, and 10000000 > 2^16. I haven't found anything directly in the documentation about how many distinct values rnorm will return, but it is presumably larger than 2^32. It is interesting to note that set.seed has different parameters kind which determines the uniform generator and normal.kind which determines the normal generator, so the latter is not a simple transformation of the former.

like image 21
John Coleman Avatar answered Oct 05 '22 22:10

John Coleman