Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R random number generator faulty?

Tags:

random

r

I was looking into the RNG of base R and was curious if the 32-bit implementation of Mersenne-Twister might be limiting it when scaled to large numbers of random numbers needed so I did a simple test:

set.seed(8)
length(unique(runif(1e8)))
# [1] 98845641
1e8 - 98845641
# 1154359

So it turns out that there are indeed numerous duplicates in the 100 million draw.

When I switch to the 64-bit version of the MT RNG implemented by dqrng package, the problem does not appear.

Question 1:

The 64 bit referenced refers to the type of floating point numbers used?

Question 2:

Am I right to conclude that because of the large span of possible numbers (64bit FP vs 32bit FP), duplicates are less likely when using the 64-bit MT?

like image 881
R J Avatar asked Oct 27 '18 18:10

R J


1 Answers

from ?Random:

Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values.

Indeed, when we calculate the expected number of draws that have a duplicate, we get

M <- 2^32
n <- 1e8
(n * (1 - (1 - 1 / M)^(n - 1))) / 2
# [1] 1150705

which is very close to the result that you have.

like image 148
Weihuang Wong Avatar answered Oct 01 '22 01:10

Weihuang Wong