Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distribution of Number of Digits of Random Numbers

I encounter this curious phenomenon trying to implement a UUID generator in JavaScript.

Basically, in JavaScript, if I generate a large list of random numbers with the built-in Math.random() on Node 4.2.2:

var records = {};
var l;
for (var i=0; i < 1e6; i += 1) {
  l = String(Math.random()).length;
  if (records[l]) {
    records[l] += 1;
  } else {
    records[l] = 1;
  }
}
console.log(records);

The numbers of digits have a strange pattern:

{ '12': 1,
  '13': 11,
  '14': 65,
  '15': 663,
  '16': 6619,
  '17': 66378,
  '18': 611441,
  '19': 281175,
  '20': 30379,
  '21': 2939,
  '22': 282,
  '23': 44,
  '24': 3 }

I thought this is a quirk of the random number generator of V8, but similar pattern appears in Python 3.4.3:

12 : 2
13 : 5
14 : 64
15 : 672
16 : 6736
17 : 66861
18 : 610907
19 : 280945
20 : 30455
21 : 3129
22 : 224

And the Python code is as follows:

import random
random.seed()
records = {}
for i in range(0, 1000000):
    n = random.random()
    l = len(str(n))
    try:
        records[l] += 1
    except KeyError:
        records[l] = 1;

for i in sorted(records):
    print(i, ':', records[i])

The pattern from 18 to below is expected: say if random number should have 20 digits, then if the last digit of a number is 0, it effectively has only 19 digits. If the random number generator is good, the probability of that happening is roughly 1/10.

But why the pattern is reversed for 19 and beyond?

I guess this is related to float numbers' binary representation, but I can't figure out exactly why.

like image 926
Andreas Blaesus Avatar asked Nov 07 '15 12:11

Andreas Blaesus


People also ask

Which distribution is used to generate random numbers?

The uniform probability distribution is used to generate random numbers from other distributions and also is useful as a “first guess” if no information about a random variable X is known other than that it is between a and b.

Does Benford's law apply random numbers?

Benford's law applies not to properly random sources of data, but to data produced by certain real-life random processes. One simple characterization of data that obey Benford's law is as those produced by processes that exhibit more-or-less exponential growth, such as populations.

What type of distribution is Benford's law?

To put it simply, Benford's law is a probability distribution for the likelihood of the first digit in a set of numbers (Frunza, 2015). The law only works for the significand* S(x) (Hill and Berger, 2017), which is essentially any number placed into a standard format.

What is the table of random digits?

A table of random digits is a listing of the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.


1 Answers

The reason is indeed related to floating point representation. A floating point number representation has a maximum number of (binary) digits it can represent, and a limited exponent value range. Now when you print this out without using scientific notation, you might in some cases need to have some zeroes after the decimal point before the significant digits start to follow.

You can visualize this effect by printing those random numbers which have the longest length when converted to string:

var records = {};
var l, r;
for (var i=0; i < 1e6; i += 1) {
    r = Math.random();
    l = String(r).length;
    if (l === 23) {
        console.log(r);
    }
    if (records[l]) {
        records[l] += 1;
    } else {
        records[l] = 1;
    }
}

This prints only the 23-long strings, and you will get numbers like these:

0.000007411070483631654
0.000053944830052166104
0.000018188989763578967
0.000029525788901141325
0.000009613635131744402
0.000005937417234758158
0.000021099748521158368

Notice the zeroes before the first non-zero digit. These are actually not stored in the number part of a floating point representation, but implied by its exponent part.

If you were to take out the leading zeroes, and then make a count:

var records = {};
var l, r, s;
for (var i=0; i < 1e6; i += 1) {
    r = Math.random();
    s = String(r).replace(/^[0\.]+/, '');
    l = s.length;

    if (records[l]) {
        records[l] += 1;
    } else {
        records[l] = 1;
    }
}

... you'll get results which are less strange.

However, you will see some irregularity that is due to how javascript converts tiny numbers to string: when they get too small, the scientific notation is used in the string representation. You can see this with the following script (not sure if every browser has the same breaking point, so maybe you need to play a bit with the number):

var i = 0.00000123456789012345678;
console.log(String(i), String(i/10));

This gives me the following output:

0.0000012345678901234567 1.2345678901234568e-7

So very small numbers will get a more fixed string length as a result, quite often 22 characters, while in the non-scientific notation a length of 23 is common. This influences also the second script I provided and length 22 will get more hits than 23.

It should be noted that javascript does not switch to scientific notation when converting to string in binary representation:

var i = 0.1234567890123456789e-120;
console.log(i.toString(2));

The above will print a string of over 450 binary digits!

like image 99
trincot Avatar answered Oct 04 '22 08:10

trincot