Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how can I get PRNG to give identical floating point numbers between platforms?

Running the following code in R 4.1.1 gives different results between platforms.

set.seed(1)
x <- rnorm(3)[3]
print(x, 22)

# -0.83562861241004716      # intel windows
# -0.8356286124100471557341 # m1 mac

print(round(x, 15), 22)
# -0.83562861241004704      # intel windows
# -0.8356286124100470447118 # m1 mac

I know the size of difference is below .Machine$double.eps and the extra digits do not carry meaningful information.

I am not happy with the fact that extra digits exist. How can I ensure exactly consistent results? Is there an RNG library that achieves this?

EDIT:

The bit representation is different.

set.seed(1)
x <- rnorm(100)
x <- sum(x)
SoDA::binaryRep(x)

.10101110001110000100001111110111000010011001011111011 # intel windows
.10101110001110000100001111110111000010011001011111110 # m1 mac

Bits are also different in runif(). This suggests that the uniform-to-gaussian conversion is not the only breaking point.

set.seed(1)
x <- runif(10000000)
x <- sum(x)
SoDA::binaryRep(x)

# kind = "Mersenne-Twister"
.10011000100101000110100110111100101000100000101100000 # intel windows
.10011000100101000110100110111100101000011111001100000 # m1 mac
# kind = "Wichmann-Hill"
.10011000100111111110101000100001001001010100000011011 # intel windows
.10011000100111111110101000100001001001010100001001010 # m1 mac
# kind = "Marsaglia-Multicarry"
.10011000100011100110000010000001011100011110100001110 # intel windows
.10011000100011100110000010000001011100011110001010000 # m1 mac
# kind = "Super-Duper"
.10011000100010011010010110100001000101100011101011110 # intel windows
.10011000100010011010010110100001000101100100001111101 # m1 mac
# kind = "Knuth-TAOCP-2002"
.10011000101000110101010111000111010011101001000101100 # intel windows
.10011000101000110101010111000111010011101001000101101 # m1 mac
# kind = "Knuth-TAOCP"
.10011000100110001011010011000001011001001110011111000 # intel windows
.10011000100110001011010011000001011001001110011111001 # m1 mac
# kind = "L'Ecuyer-CMRG"
.10011000100100010110100101101001011000000111010110101 # intel windows
.10011000100100010110100101101001011000001000010100001 # m1 mac
like image 206
LambdaPsi Avatar asked Oct 28 '21 22:10

LambdaPsi


People also ask

Why are floating-point numbers not exact?

Floating-point decimal values generally do not have an exact binary representation due to how the CPU represents floating point data. For this reason, you may experience a loss of precision, and some floating-point operations may produce unexpected results.

How accurate are floating-point numbers?

The data type float has 24 bits of precision. This is equivalent to only about 7 decimal places. (The rest of the 32 bits are used for the sign and size of the number.) The number of places of precision for float is the same no matter what the size of the number.

What is a floating point number coding?

In programming, a floating-point or float is a variable type that is used to store floating-point number values. A floating-point number is one where the position of the decimal point can "float" rather than being in a fixed position within a number. Examples of floating-point numbers are 1.23, 87.425, and 9039454.2.

What's floating-point representation?

Floating-point representations have a base (which is always assumed to be even) and a precision p. If = 10 and p = 3, then the number 0.1 is represented as 1.00 × 10-1. If = 2 and p = 24, then the decimal number 0.1 cannot be represented exactly, but is approximately 1.10011001100110011001101 × 2-4.


1 Answers

(Comments from Oct. 29 and Nov. 2 moved here and edited.)

I should note that such subtle reproducibility issues with pseudorandom number generators (PRNGs) can occur when floating-point arithmetic is involved. For instance, Intel's instruction set architecture might make use of 80-bit extended precision for internal arithmetic. Extended precision, though, is only one way (among a host of others) that floating-point arithmetic might lead to non-reproducible pseudorandom numbers. Consider that Intel's and Arm's instruction set architectures are different enough to cause reproducibility issues. (If I understand, an Arm instruction set is what is used in Apple's M1 chip.)

By contrast, integer arithmetic has fewer reproducibility problems.

Thus, if bit-for-bit reproducibility matters to you, you should try to find an R language PRNG that uses only integer operations. (Indeed, computers generate pseudorandom floating-point numbers via integers, not the other way around, and most PRNGs produce integers, not floating-point numbers.)

For instance, for uniform variates, take the integer output of the Mersenne Twister algorithm without manipulating it. For Gaussian (and exponential) random variates, there is fortunately an algorithm by Karney that generates arbitrary-precision variates with only integer operations. Also, consider rational arithmetic built on underlying integer operations.

REFERENCES:

Karney, C.F.F., 2016. Sampling exactly from the normal distribution. ACM Transactions on Mathematical Software (TOMS), 42(1), pp.1-14.

like image 140
Peter O. Avatar answered Nov 10 '22 06:11

Peter O.