Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is random() * random() different to random() ** 2?

Is there are difference between random() * random() and random() ** 2? random() returns a value between 0 and 1 from a uniform distribution.

When testing both versions of random square numbers I noticed a little difference. I created 100000 random square numbers and counted how many numbers are in each interval of 0.01 (0.00 to 0.01, 0.01 to 0.02, ...). It seems that these versions of squared random number generation are different.

Squaring a random number instead of multiplying two random numbers has you reuse a random number, but I think the distribution should remain the same. Is there really a difference? If not, why is my test showing a difference?


I generate two random binned distributions for random() * random() and one for random() ** 2 like so:

from random import random

lst = [0 for i in range(100)]
lst2, lst3 = list(lst), list(lst)

#create two random distributions for random() * random()
for i in range(100000):
    lst[int(100 * random() * random())] += 1

for i in range(100000):
    lst2[int(100 * random() * random())] += 1

for i in range(100000):
    lst3[int(100 * random() ** 2)] += 1

which gives

>>> lst
[
    5626, 4139, 3705, 3348, 3085, 2933, 2725, 2539, 2449, 2413,
    2259, 2179, 2116, 2062, 1961, 1827, 1754, 1743, 1719, 1753,
    1522, 1543, 1513, 1361, 1372, 1290, 1336, 1274, 1219, 1178,
    1139, 1147, 1109, 1163, 1060, 1022, 1007,  952,  984,  957,
     906,  900,  843,  883,  802,  801,  710,  752,  705,  729,
     654,  668,  628,  633,  615,  600,  566,  551,  532,  541,
     511,  493,  465,  503,  450,  394,  405,  405,  404,  332,
     369,  369,  332,  316,  272,  284,  315,  257,  224,  230,
     221,  175,  209,  188,  162,  156,  159,  114,  131,  124,
     96,   94,   80,   73,   54,   45,   43,   23,   18,     3
]

>>> lst2
[
    5548, 4218, 3604, 3237, 3082, 2921, 2872, 2570, 2479, 2392,
    2296, 2205, 2113, 1990, 1901, 1814, 1801, 1714, 1660, 1591,
    1631, 1523, 1491, 1505, 1385, 1329, 1275, 1308, 1324, 1207,
    1209, 1208, 1117, 1136, 1015, 1080, 1001,  993,  958,  948,
     903,  843,  843,  849,  801,  799,  748,  729,  705,  660,
     701,  689,  676,  656,  632,  581,  564,  537,  517,  525,
     483,  478,  473,  494,  457,  422,  412,  390,  384,  352,
     350,  323,  322,  308,  304,  275,  272,  256,  246,  265,
     227,  204,  171,  191,  191,  136,  145,  136,  108,  117,
      93,   83,   74,   77,   55,   38,   32,   25,   21,    1
]

>>> lst3
[
    10047, 4198, 3214, 2696, 2369, 2117, 2010, 1869, 1752, 1653,
     1552, 1416, 1405, 1377, 1328, 1293, 1252, 1245, 1121, 1146,
     1047, 1051, 1123, 1100,  951,  948,  967,  933,  939,  925,
      940,  893,  929,  874,  824,  843,  868,  800,  844,  822,
      746,  733,  808,  734,  740,  682,  713,  681,  675,  686,
      689,  730,  707,  677,  645,  661,  645,  651,  649,  672,
      679,  593,  585,  622,  611,  636,  543,  571,  594,  593,
      629,  624,  593,  567,  584,  585,  610,  549,  553,  574,
      547,  583,  582,  553,  536,  512,  498,  562,  536,  523,
      553,  485,  503,  502,  518,  554,  485,  482,  470,  516
]

The expected random error is the difference in the first two:

[
    78,  79, 101, 111,   3,  12, 147,  31,  30,  21,
    37,  26,   3,  72,  60,  13,  47,  29,  59, 162,
   109,  20,  22, 144,  13,  39,  61,  34, 105,  29,
    70,  61,   8,  27,  45,  58,   6,  41,  26,   9,
     3,  57,   0,  34,   1,   2,  38,  23,   0,  69,
    47,  21,  48,  23,  17,  19,   2,  14,  15,  16,
    28,  15,   8,   9,   7,  28,   7,  15,  20,  20,
    19,  46,  10,   8,  32,   9,  43,   1,  22,  35,
     6,  29,  38,   3,  29,  20,  14,  22,  23,   7,
     3,  11,   6,   4,   1,   7,  11,   2,   3,   2
]

But the difference between the first and third is much larger, hinting that the distributions are different:

[
    4421,   59,  491,  652,  716,  816,  715,  670,  697,  760,
     707,  763,  711,  685,  633,  534,  502,  498,  598,  607,
     475,  492,  390,  261,  421,  342,  369,  341,  280,  253,
     199,  254,  180,  289,  236,  179,  139,  152,  140,  135,
     160,  167,   35,  149,   62,  119,    3,   71,   30,   43,
      35,   62,   79,   44,   30,   61,   79,  100,  117,  131,
     168,  100,  120,  119,  161,  242,  138,  166,  190,  261,
     260,  255,  261,  251,  312,  301,  295,  292,  329,  344,
     326,  408,  373,  365,  374,  356,  339,  448,  405,  399,
     457,  391,  423,  429,  464,  509,  442,  459,  452,  513
]
like image 392
Sirac Avatar asked Jun 29 '14 19:06

Sirac


2 Answers

Here are some plots:

All the possibilities for random() * random():

A 2D heatmap with most intensity in the top-right.

The x-axis is one random variable increasing rightwards, and the y-axis is another increasing upwards.

You can see that if either is low, the result will be low, and both have to be high to get a high result.

When the only decider is a single axis, as in the random() ** 2 case, you get

A 2D heatmap that increases quadratically from bottom to top, and is invariant in the x-axis

In this it is far more likely to get a very dark (large) value, as the whole top is dark, not just a corner.

When you make both linearized, with random() * random() on top:

A linearization of the first graphA linearization of the second graph

You see that the distributions are indeed different.

Code:

import numpy
import matplotlib
from matplotlib import pyplot
import matplotlib.cm

def make_fig(name, data):
    figure = matplotlib.pyplot.figure()
    print(data.shape)
    figure.set_size_inches(data.shape[1]//100, data.shape[0]//100)

    axes = matplotlib.pyplot.Axes(figure, [0, 0, 1, 1])
    axes.set_axis_off()
    figure.add_axes(axes)

    axes.imshow(data, origin="lower", cmap=matplotlib.cm.Greys, aspect="auto")
    figure.savefig(name, dpi=200)

xs, ys = numpy.mgrid[:1000, :1000]
two_random = xs * ys

make_fig("two_random.png", two_random)

two_random_flat = two_random.flatten()
two_random_flat.sort()
two_random_flat = two_random_flat[::1000]

make_fig("two_random_1D.png", numpy.tile(two_random_flat, (100, 1)))

one_random = xs * xs

make_fig("one_random.png", one_random)

one_random_flat = one_random.flatten()
one_random_flat.sort()
one_random_flat = one_random_flat[::1000]

make_fig("one_random_1D.png", numpy.tile(one_random_flat, (100, 1)))

You can also approach this mathematically. The probability of getting a value less than x, with 0 ≤ x ≤ 1 is

For random()²:

√x

as the probability the random value being lower than x is the probability that random()² < x.

For random() · random():

Given the first random variable is r and the second is R, we can find the probability that Rr < x with a fixed R:

P(Rr < x)
= P(r < x/R)
= 1 if x > R (and so x/R > 1)
or
= x/R otherwise

So we want

∫ P(Rr < x) dR from R=0 to R=1

= ∫ 1   dR from R=0 to R=x
+ ∫ x/R dR from R=x to R=1

= x(1 - ln R)

As we can see, √x ≠ x(1 - ln R).

These distributions show up as:

Probability that the function is less than a given value

The y-axis gives the probability that the line (random()² or random() · random()) is less than the x axis.

We see that for the random() · random(), the probability of large numbers is significantly less.

Density functions

I guess the most revealing thing is to differentiate (½x ^ -½ and - ln x) and plot the probability density functions:

Probabilities of each number's occurring

This shows the probability of each x in relative terms. So the probability that x is large (> 0.5) is about twice for the random()² variant.

like image 114
Veedrac Avatar answered Nov 06 '22 05:11

Veedrac


Let's simplify the problem somewhat. Consider throwing two dice and multiplying the result against throwing one die and squaring it. In the first case you have a 1 in 36 chance of throwing a double 1, therefore a 1 in 36 chance the product is 1. On the other hand the second case obviously has a 1 in 6 chance that the square is 1. The same applies for a double 6, so the extremes are much more probable when squaring.

The same follows when you use random floats: you are much less likely to get two random values at the extremes than you are to get a single value, so very small or very large values will come up much more often when squaring then when multiplying two independent values.

like image 31
Duncan Avatar answered Nov 06 '22 05:11

Duncan