Running the following code in R 4.1.1 gives different results between platforms.
set.seed(1)
x <- rnorm(3)[3]
print(x, 22)
# -0.83562861241004716 # intel windows
# -0.8356286124100471557341 # m1 mac
print(round(x, 15), 22)
# -0.83562861241004704 # intel windows
# -0.8356286124100470447118 # m1 mac
I know the size of difference is below .Machine$double.eps
and the extra digits do not carry meaningful information.
I am not happy with the fact that extra digits exist. How can I ensure exactly consistent results? Is there an RNG library that achieves this?
EDIT:
The bit representation is different.
set.seed(1)
x <- rnorm(100)
x <- sum(x)
SoDA::binaryRep(x)
.10101110001110000100001111110111000010011001011111011 # intel windows
.10101110001110000100001111110111000010011001011111110 # m1 mac
Bits are also different in runif()
. This suggests that the uniform-to-gaussian conversion is not the only breaking point.
set.seed(1)
x <- runif(10000000)
x <- sum(x)
SoDA::binaryRep(x)
# kind = "Mersenne-Twister"
.10011000100101000110100110111100101000100000101100000 # intel windows
.10011000100101000110100110111100101000011111001100000 # m1 mac
# kind = "Wichmann-Hill"
.10011000100111111110101000100001001001010100000011011 # intel windows
.10011000100111111110101000100001001001010100001001010 # m1 mac
# kind = "Marsaglia-Multicarry"
.10011000100011100110000010000001011100011110100001110 # intel windows
.10011000100011100110000010000001011100011110001010000 # m1 mac
# kind = "Super-Duper"
.10011000100010011010010110100001000101100011101011110 # intel windows
.10011000100010011010010110100001000101100100001111101 # m1 mac
# kind = "Knuth-TAOCP-2002"
.10011000101000110101010111000111010011101001000101100 # intel windows
.10011000101000110101010111000111010011101001000101101 # m1 mac
# kind = "Knuth-TAOCP"
.10011000100110001011010011000001011001001110011111000 # intel windows
.10011000100110001011010011000001011001001110011111001 # m1 mac
# kind = "L'Ecuyer-CMRG"
.10011000100100010110100101101001011000000111010110101 # intel windows
.10011000100100010110100101101001011000001000010100001 # m1 mac
Floating-point decimal values generally do not have an exact binary representation due to how the CPU represents floating point data. For this reason, you may experience a loss of precision, and some floating-point operations may produce unexpected results.
The data type float has 24 bits of precision. This is equivalent to only about 7 decimal places. (The rest of the 32 bits are used for the sign and size of the number.) The number of places of precision for float is the same no matter what the size of the number.
In programming, a floating-point or float is a variable type that is used to store floating-point number values. A floating-point number is one where the position of the decimal point can "float" rather than being in a fixed position within a number. Examples of floating-point numbers are 1.23, 87.425, and 9039454.2.
Floating-point representations have a base (which is always assumed to be even) and a precision p. If = 10 and p = 3, then the number 0.1 is represented as 1.00 × 10-1. If = 2 and p = 24, then the decimal number 0.1 cannot be represented exactly, but is approximately 1.10011001100110011001101 × 2-4.
(Comments from Oct. 29 and Nov. 2 moved here and edited.)
I should note that such subtle reproducibility issues with pseudorandom number generators (PRNGs) can occur when floating-point arithmetic is involved. For instance, Intel's instruction set architecture might make use of 80-bit extended precision for internal arithmetic. Extended precision, though, is only one way (among a host of others) that floating-point arithmetic might lead to non-reproducible pseudorandom numbers. Consider that Intel's and Arm's instruction set architectures are different enough to cause reproducibility issues. (If I understand, an Arm instruction set is what is used in Apple's M1 chip.)
By contrast, integer arithmetic has fewer reproducibility problems.
Thus, if bit-for-bit reproducibility matters to you, you should try to find an R language PRNG that uses only integer operations. (Indeed, computers generate pseudorandom floating-point numbers via integers, not the other way around, and most PRNGs produce integers, not floating-point numbers.)
For instance, for uniform variates, take the integer output of the Mersenne Twister algorithm without manipulating it. For Gaussian (and exponential) random variates, there is fortunately an algorithm by Karney that generates arbitrary-precision variates with only integer operations. Also, consider rational arithmetic built on underlying integer operations.
REFERENCES:
Karney, C.F.F., 2016. Sampling exactly from the normal distribution. ACM Transactions on Mathematical Software (TOMS), 42(1), pp.1-14.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With