Why is random() * random() different to random() ** 2?

Question

Is there are difference between random() * random() and random() ** 2? random() returns a value between 0 and 1 from a uniform distribution.

When testing both versions of random square numbers I noticed a little difference. I created 100000 random square numbers and counted how many numbers are in each interval of 0.01 (0.00 to 0.01, 0.01 to 0.02, ...). It seems that these versions of squared random number generation are different.

Squaring a random number instead of multiplying two random numbers has you reuse a random number, but I think the distribution should remain the same. Is there really a difference? If not, why is my test showing a difference?

I generate two random binned distributions for random() * random() and one for random() ** 2 like so:

from random import random

lst = [0 for i in range(100)]
lst2, lst3 = list(lst), list(lst)

#create two random distributions for random() * random()
for i in range(100000):
    lst[int(100 * random() * random())] += 1

for i in range(100000):
    lst2[int(100 * random() * random())] += 1

for i in range(100000):
    lst3[int(100 * random() ** 2)] += 1

which gives

>>> lst
[
    5626, 4139, 3705, 3348, 3085, 2933, 2725, 2539, 2449, 2413,
    2259, 2179, 2116, 2062, 1961, 1827, 1754, 1743, 1719, 1753,
    1522, 1543, 1513, 1361, 1372, 1290, 1336, 1274, 1219, 1178,
    1139, 1147, 1109, 1163, 1060, 1022, 1007,  952,  984,  957,
     906,  900,  843,  883,  802,  801,  710,  752,  705,  729,
     654,  668,  628,  633,  615,  600,  566,  551,  532,  541,
     511,  493,  465,  503,  450,  394,  405,  405,  404,  332,
     369,  369,  332,  316,  272,  284,  315,  257,  224,  230,
     221,  175,  209,  188,  162,  156,  159,  114,  131,  124,
     96,   94,   80,   73,   54,   45,   43,   23,   18,     3
]

>>> lst2
[
    5548, 4218, 3604, 3237, 3082, 2921, 2872, 2570, 2479, 2392,
    2296, 2205, 2113, 1990, 1901, 1814, 1801, 1714, 1660, 1591,
    1631, 1523, 1491, 1505, 1385, 1329, 1275, 1308, 1324, 1207,
    1209, 1208, 1117, 1136, 1015, 1080, 1001,  993,  958,  948,
     903,  843,  843,  849,  801,  799,  748,  729,  705,  660,
     701,  689,  676,  656,  632,  581,  564,  537,  517,  525,
     483,  478,  473,  494,  457,  422,  412,  390,  384,  352,
     350,  323,  322,  308,  304,  275,  272,  256,  246,  265,
     227,  204,  171,  191,  191,  136,  145,  136,  108,  117,
      93,   83,   74,   77,   55,   38,   32,   25,   21,    1
]

>>> lst3
[
    10047, 4198, 3214, 2696, 2369, 2117, 2010, 1869, 1752, 1653,
     1552, 1416, 1405, 1377, 1328, 1293, 1252, 1245, 1121, 1146,
     1047, 1051, 1123, 1100,  951,  948,  967,  933,  939,  925,
      940,  893,  929,  874,  824,  843,  868,  800,  844,  822,
      746,  733,  808,  734,  740,  682,  713,  681,  675,  686,
      689,  730,  707,  677,  645,  661,  645,  651,  649,  672,
      679,  593,  585,  622,  611,  636,  543,  571,  594,  593,
      629,  624,  593,  567,  584,  585,  610,  549,  553,  574,
      547,  583,  582,  553,  536,  512,  498,  562,  536,  523,
      553,  485,  503,  502,  518,  554,  485,  482,  470,  516
]

The expected random error is the difference in the first two:

[
    78,  79, 101, 111,   3,  12, 147,  31,  30,  21,
    37,  26,   3,  72,  60,  13,  47,  29,  59, 162,
   109,  20,  22, 144,  13,  39,  61,  34, 105,  29,
    70,  61,   8,  27,  45,  58,   6,  41,  26,   9,
     3,  57,   0,  34,   1,   2,  38,  23,   0,  69,
    47,  21,  48,  23,  17,  19,   2,  14,  15,  16,
    28,  15,   8,   9,   7,  28,   7,  15,  20,  20,
    19,  46,  10,   8,  32,   9,  43,   1,  22,  35,
     6,  29,  38,   3,  29,  20,  14,  22,  23,   7,
     3,  11,   6,   4,   1,   7,  11,   2,   3,   2
]

But the difference between the first and third is much larger, hinting that the distributions are different:

[
    4421,   59,  491,  652,  716,  816,  715,  670,  697,  760,
     707,  763,  711,  685,  633,  534,  502,  498,  598,  607,
     475,  492,  390,  261,  421,  342,  369,  341,  280,  253,
     199,  254,  180,  289,  236,  179,  139,  152,  140,  135,
     160,  167,   35,  149,   62,  119,    3,   71,   30,   43,
      35,   62,   79,   44,   30,   61,   79,  100,  117,  131,
     168,  100,  120,  119,  161,  242,  138,  166,  190,  261,
     260,  255,  261,  251,  312,  301,  295,  292,  329,  344,
     326,  408,  373,  365,  374,  356,  339,  448,  405,  399,
     457,  391,  423,  429,  464,  509,  442,  459,  452,  513
]

Veedrac · Accepted Answer

Here are some plots:

All the possibilities for random() * random():

A 2D heatmap with most intensity in the top-right.

The x-axis is one random variable increasing rightwards, and the y-axis is another increasing upwards.

You can see that if either is low, the result will be low, and both have to be high to get a high result.

When the only decider is a single axis, as in the random() ** 2 case, you get

A 2D heatmap that increases quadratically from bottom to top, and is invariant in the x-axis

In this it is far more likely to get a very dark (large) value, as the whole top is dark, not just a corner.

When you make both linearized, with random() * random() on top:

A linearization of the first graph A linearization of the second graph

You see that the distributions are indeed different.

Code:

import numpy
import matplotlib
from matplotlib import pyplot
import matplotlib.cm

def make_fig(name, data):
    figure = matplotlib.pyplot.figure()
    print(data.shape)
    figure.set_size_inches(data.shape[1]//100, data.shape[0]//100)

    axes = matplotlib.pyplot.Axes(figure, [0, 0, 1, 1])
    axes.set_axis_off()
    figure.add_axes(axes)

    axes.imshow(data, origin="lower", cmap=matplotlib.cm.Greys, aspect="auto")
    figure.savefig(name, dpi=200)

xs, ys = numpy.mgrid[:1000, :1000]
two_random = xs * ys

make_fig("two_random.png", two_random)

two_random_flat = two_random.flatten()
two_random_flat.sort()
two_random_flat = two_random_flat[::1000]

make_fig("two_random_1D.png", numpy.tile(two_random_flat, (100, 1)))

one_random = xs * xs

make_fig("one_random.png", one_random)

one_random_flat = one_random.flatten()
one_random_flat.sort()
one_random_flat = one_random_flat[::1000]

make_fig("one_random_1D.png", numpy.tile(one_random_flat, (100, 1)))

You can also approach this mathematically. The probability of getting a value less than x, with 0 ≤ x ≤ 1 is

For `random()²`:

√x

as the probability the random value being lower than x is the probability that random()² < x.

For `random() · random()`:

Given the first random variable is r and the second is R, we can find the probability that Rr < x with a fixed R:

P(Rr < x)
= P(r < x/R)
= 1 if x > R (and so x/R > 1)
or
= x/R otherwise

So we want

∫ P(Rr < x) dR from R=0 to R=1

= ∫ 1   dR from R=0 to R=x
+ ∫ x/R dR from R=x to R=1

= x(1 - ln R)

As we can see, √x ≠ x(1 - ln R).

These distributions show up as:

Probability that the function is less than a given value

The y-axis gives the probability that the line (random()² or random() · random()) is less than the x axis.

We see that for the random() · random(), the probability of large numbers is significantly less.

Density functions

I guess the most revealing thing is to differentiate (½x ^ -½ and - ln x) and plot the probability density functions:

Probabilities of each number's occurring

This shows the probability of each x in relative terms. So the probability that x is large (> 0.5) is about twice for the random()² variant.

Duncan · Answer

Let's simplify the problem somewhat. Consider throwing two dice and multiplying the result against throwing one die and squaring it. In the first case you have a 1 in 36 chance of throwing a double 1, therefore a 1 in 36 chance the product is 1. On the other hand the second case obviously has a 1 in 6 chance that the square is 1. The same applies for a double 6, so the extremes are much more probable when squaring.

The same follows when you use random floats: you are much less likely to get two random values at the extremes than you are to get a single value, so very small or very large values will come up much more often when squaring then when multiplying two independent values.

Why is random() * random() different to random() ** 2?

Tags:

python

random

random-sample

Sirac

2 Answers

For `random()²`:

For `random() · random()`:

Density functions

Veedrac

Duncan

Recent Activity

Donate For Us

Why is random() * random() different to random() ** 2?

Tags:

python

random

random-sample

Sirac

2 Answers

For random()²:

For random() · random():

Density functions

Veedrac

Duncan

Related questions

Recent Activity

Donate For Us

For `random()²`:

For `random() · random()`: