I was looking into the RNG of base R and was curious if the 32-bit implementation of Mersenne-Twister might be limiting it when scaled to large numbers of random numbers needed so I did a simple test:
set.seed(8)
length(unique(runif(1e8)))
# [1] 98845641
1e8 - 98845641
# 1154359
So it turns out that there are indeed numerous duplicates in the 100 million draw.
When I switch to the 64-bit version of the MT RNG implemented by dqrng
package, the problem does not appear.
The 64 bit referenced refers to the type of floating point numbers used?
Am I right to conclude that because of the large span of possible numbers (64bit FP vs 32bit FP), duplicates are less likely when using the 64-bit MT?
from ?Random
:
Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values.
Indeed, when we calculate the expected number of draws that have a duplicate, we get
M <- 2^32
n <- 1e8
(n * (1 - (1 - 1 / M)^(n - 1))) / 2
# [1] 1150705
which is very close to the result that you have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With