Distribution of final digits of random numbers in Python

Tags:

There are two obvious ways to generate a random digit from 0 to 9 in Python. One could generate a random floating point number between 0 and 1, multiply by 10, and round down. Alternatively, one could use the random.randint method.

import random  def random_digit_1():     return int(10 * random.random())  def random_digit_2():     return random.randint(0, 9)

I was curious about what would happen if one generated a random number between 0 and 1, and kept the last digit. I didn't necessarily expect the distribution to be uniform, but I found the result quite surprising.

from random import random, seed from collections import Counter  seed(0) counts = Counter(int(str(random())[-1]) for _ in range(1_000_000)) print(counts)

Output:

Counter({1: 84206,          5: 130245,          3: 119433,          6: 129835,          8: 101488,          2: 100861,          9: 84796,          4: 129088,          7: 120048})

A histogram is shown below. Note that 0 does not appear, since trailing zeros are truncated. But can anyone explain why the digits 4, 5, and 6 are more common than the rest? I used Python 3.6.10, but the results were similar in Python 3.8.0a4.

Distribution of final digits of random floats

376

asked Apr 25 '20 01:04

Dave Radcliffe

2 Answers

That's not "the last digit" of the number. That's the last digit of the string str gave you when passed the number.

When you call str on a float, Python gives you enough digits that calling float on the string will give you the original float. For this purpose, a trailing 1 or 9 is less likely to be necessary than other digits, because a trailing 1 or 9 means the number is very close to the value you'd get by rounding off that digit. There's a good chance no other floats are closer, and if so, that digit can be discarded without sacrificing float(str(original_float)) behavior.

If str gave you enough digits to exactly represent the argument, the last digit would almost always be 5, except when random.random() returns 0.0, in which case the last digit would be 0. (Floats can only represent dyadic rationals, and the last nonzero decimal digit of a non-integer dyadic rational is always 5.) The outputs would also be extremely long, looking like

>>> import decimal, random >>> print(decimal.Decimal(random.random())) 0.29711195452007921335990658917580731213092803955078125

which is one of the reasons str doesn't do that.

If str gave you exactly 17 significant digits (enough to distinguish all float values from each other, but sometimes more digits than necessary), then the effect you're seeing would disappear. There would be a nearly uniform distribution of trailing digits (including 0).

(Also, you forgot that str sometimes returns a string in scientific notation, but that's a minor effect, because there's a low probability of getting a float where that would happen out of random.random().)

131

answered Oct 10 '22 07:10

user2357112 supports Monica

TL;DR Your example is not actually looking at the last digit. The last digit of a finite binary-represented mantissa converted to base-10 should always be 0 or 5.

Take a look at the comment in cpython/pystrtod.c:

char * PyOS_double_to_string(double val,                                          char format_code,                                          int precision,                                          int flags,                                          int *type) {     char format[32];     Py_ssize_t bufsize;     char *buf;     int t, exp;     int upper = 0;      /* Validate format_code, and map upper and lower case */     switch (format_code) {     // ...     case 'r':          /* repr format */         /* Supplied precision is unused, must be 0. */         if (precision != 0) {             PyErr_BadInternalCall();             return NULL;         }         /* The repr() precision (17 significant decimal digits) is the            minimal number that is guaranteed to have enough precision            so that if the number is read back in the exact same binary            value is recreated.  This is true for IEEE floating point            by design, and also happens to work for all other modern            hardware. */         precision = 17;         format_code = 'g';         break;     // ... }

Wikipedia confirms this:

The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2^-53 ≈ 1.11 × 10^-16). If a decimal string with at most 15 significant digits is converted to IEEE 754 double-precision representation, and then converted back to a decimal string with the same number of digits, the final result should match the original string. If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.

Thus, when we use str (or repr), we are only representing 17 significant digits in base-10. This means some of the floating point number will be truncated. In fact, to get the exact representation, you need a precision of 53 significant digits! You can verify this as follows:

>>> counts = Counter( ...     len(f"{random():.99f}".lstrip("0.").rstrip("0")) ...     for _ in range(1000000) ... ) >>> counts Counter({53: 449833,          52: 270000,          51: 139796,          50: 70341,          49: 35030,          48: 17507,          47: 8610,          46: 4405,          45: 2231,          44: 1120,          43: 583,          42: 272,          41: 155,          40: 60,          39: 25,          38: 13,          37: 6,          36: 5,          35: 4,          34: 3,          32: 1}) >>> max(counts) 53

Now using the maximum precision, here's the proper way to find the "last digit":

>>> counts = Counter( ...     int(f"{random():.53f}".lstrip("0.").rstrip("0")[-1]) ...     for _ in range(1000000) ... ) >>> counts Counter({5: 1000000})

Thus, the last digit is always 5. (Or, in very rare cases, 0.) This makes sense since:

2**0  == 1.0 2**-1 == 0.5 2**-2 == 0.25 2**-3 == 0.125 2**-4 == 0.0625 2**-5 == 0.03125 2**-6 == 0.015625 2**-7 == 0.0078125 2**-8 == 0.00390625 2**-9 == 0.001953125 ... 2**-k == 0.[k-1 digits]5

And all mantissas are some partial sum of these coefficients.

NOTE: As pointed out by user2357112, the correct implementations to look at are PyOS_double_to_string and format_float_short, but I'll leave the current one in because it's more pedagogically interesting.

answered Oct 10 '22 06:10

Mateen Ulhaq

Related questions
                            
                                Python __enter__ / __exit__ vs __init__ (or __new__) / __del__
                            
                                Weird indexing using numpy
                            
                                python locale strange error. what's going on here exactly?
                            
                                When to use pack or grid layouts in tkinter?
                            
                                Global variable with imports
                            
                                Launch a totally independent process from Python
                            
                                Confidence interval for LOWESS in Python
                            
                                Is there a good dependency analysis tool for Python? [closed]
                            
                                Communicate multiple times with a process without breaking the pipe?
                            
                                python: Can I run a python script without actually installing python?
                            
                                How to right-align columns content in reStructuredText simple tables?
                            
                                How do I install Python libraries in wheel format?
                            
                                How to run tests without installing package?
                            
                                What are the steps to make a ModelForm work with a ManyToMany relationship with an intermediary model in Django?
                            
                                Why does the calculated width and height in pixel of a string in Tkinter differ between platforms?
                            
                                pip install PyQt IOError
                            
                                How can a #defined C value be exposed to Python in a Cython module?
                            
                                python paths and import order
                            
                                How to fix: W602 deprecated form of raising exception
                            
                                python: What happens when class attribute, instance attribute, and method all have the same name?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Distribution of final digits of random numbers in Python

Tags:

python

random

Dave Radcliffe

People also ask

2 Answers

user2357112 supports Monica

Mateen Ulhaq

Recent Activity

Donate For Us