There are two obvious ways to generate a random digit from 0 to 9 in Python. One could generate a random floating point number between 0 and 1, multiply by 10, and round down. Alternatively, one could use the random.randint
method.
import random def random_digit_1(): return int(10 * random.random()) def random_digit_2(): return random.randint(0, 9)
I was curious about what would happen if one generated a random number between 0 and 1, and kept the last digit. I didn't necessarily expect the distribution to be uniform, but I found the result quite surprising.
from random import random, seed from collections import Counter seed(0) counts = Counter(int(str(random())[-1]) for _ in range(1_000_000)) print(counts)
Output:
Counter({1: 84206, 5: 130245, 3: 119433, 6: 129835, 8: 101488, 2: 100861, 9: 84796, 4: 129088, 7: 120048})
A histogram is shown below. Note that 0 does not appear, since trailing zeros are truncated. But can anyone explain why the digits 4, 5, and 6 are more common than the rest? I used Python 3.6.10, but the results were similar in Python 3.8.0a4.
Random integer values can be generated with the randint() function. This function takes two arguments: the start and the end of the range for the generated integer values. Random integers are generated within and including the start and end of range values, specifically in the interval [start, end].
uniform() method in Python Random module uniform() is a method specified in the random library in Python 3. Parameters : x Specifies the lower limit of the random number required to generate. y Specifies the upper limit of the random number required to generate.
That's not "the last digit" of the number. That's the last digit of the string str
gave you when passed the number.
When you call str
on a float, Python gives you enough digits that calling float
on the string will give you the original float. For this purpose, a trailing 1 or 9 is less likely to be necessary than other digits, because a trailing 1 or 9 means the number is very close to the value you'd get by rounding off that digit. There's a good chance no other floats are closer, and if so, that digit can be discarded without sacrificing float(str(original_float))
behavior.
If str
gave you enough digits to exactly represent the argument, the last digit would almost always be 5, except when random.random()
returns 0.0, in which case the last digit would be 0. (Floats can only represent dyadic rationals, and the last nonzero decimal digit of a non-integer dyadic rational is always 5.) The outputs would also be extremely long, looking like
>>> import decimal, random >>> print(decimal.Decimal(random.random())) 0.29711195452007921335990658917580731213092803955078125
which is one of the reasons str
doesn't do that.
If str
gave you exactly 17 significant digits (enough to distinguish all float values from each other, but sometimes more digits than necessary), then the effect you're seeing would disappear. There would be a nearly uniform distribution of trailing digits (including 0).
(Also, you forgot that str
sometimes returns a string in scientific notation, but that's a minor effect, because there's a low probability of getting a float where that would happen out of random.random()
.)
TL;DR Your example is not actually looking at the last digit. The last digit of a finite binary-represented mantissa converted to base-10 should always be 0
or 5
.
Take a look at the comment in cpython/pystrtod.c
:
char * PyOS_double_to_string(double val, char format_code, int precision, int flags, int *type) { char format[32]; Py_ssize_t bufsize; char *buf; int t, exp; int upper = 0; /* Validate format_code, and map upper and lower case */ switch (format_code) { // ... case 'r': /* repr format */ /* Supplied precision is unused, must be 0. */ if (precision != 0) { PyErr_BadInternalCall(); return NULL; } /* The repr() precision (17 significant decimal digits) is the minimal number that is guaranteed to have enough precision so that if the number is read back in the exact same binary value is recreated. This is true for IEEE floating point by design, and also happens to work for all other modern hardware. */ precision = 17; format_code = 'g'; break; // ... }
Wikipedia confirms this:
The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2-53 ≈ 1.11 × 10-16). If a decimal string with at most 15 significant digits is converted to IEEE 754 double-precision representation, and then converted back to a decimal string with the same number of digits, the final result should match the original string. If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.
Thus, when we use str
(or repr
), we are only representing 17 significant digits in base-10. This means some of the floating point number will be truncated. In fact, to get the exact representation, you need a precision of 53 significant digits! You can verify this as follows:
>>> counts = Counter( ... len(f"{random():.99f}".lstrip("0.").rstrip("0")) ... for _ in range(1000000) ... ) >>> counts Counter({53: 449833, 52: 270000, 51: 139796, 50: 70341, 49: 35030, 48: 17507, 47: 8610, 46: 4405, 45: 2231, 44: 1120, 43: 583, 42: 272, 41: 155, 40: 60, 39: 25, 38: 13, 37: 6, 36: 5, 35: 4, 34: 3, 32: 1}) >>> max(counts) 53
Now using the maximum precision, here's the proper way to find the "last digit":
>>> counts = Counter( ... int(f"{random():.53f}".lstrip("0.").rstrip("0")[-1]) ... for _ in range(1000000) ... ) >>> counts Counter({5: 1000000})
Thus, the last digit is always 5
. (Or, in very rare cases, 0
.) This makes sense since:
2**0 == 1.0 2**-1 == 0.5 2**-2 == 0.25 2**-3 == 0.125 2**-4 == 0.0625 2**-5 == 0.03125 2**-6 == 0.015625 2**-7 == 0.0078125 2**-8 == 0.00390625 2**-9 == 0.001953125 ... 2**-k == 0.[k-1 digits]5
And all mantissas are some partial sum of these coefficients.
NOTE: As pointed out by user2357112, the correct implementations to look at are PyOS_double_to_string
and format_float_short
, but I'll leave the current one in because it's more pedagogically interesting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With