How would I make a list of N (say 100) random numbers, so that their sum is 1?
I can make a list of random numbers with
r = [ran.random() for i in range(1,100)]
How would I modify this so that the list sums to 1 (this is for a probability simulation).
A random number generator always returns a value between 0 and 1, but never equal to one or the other. Any number times a randomly generated value will always equal to less than that number, never more, and never equal.
Enter the formula =randbetween(50,150) in cells G11:G16, and =sum(g11:g16) in cell G18. This will generate 6 random numbers between 50 and 150, and their sum. If you target value is in D18, then the following will refresh the random values until the values in D18 and G18 are equal.
Using the random.The random. uniform() function is perfectly suited to generate a random number between the numbers 0 and 1, as it is utilized to return a random floating-point number between two given numbers specified as the parameters for the function.
The simplest solution is indeed to take N random values and divide by the sum.
A more generic solution is to use the Dirichlet distribution which is available in numpy.
By changing the parameters of the distribution you can change the "randomness" of individual numbers
>>> import numpy as np, numpy.random
>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975 0.14165316 0.01029262 0.168136 0.03061161 0.09046587
0.19987289 0.13398581 0.03119906 0.17598322]]
>>> print np.random.dirichlet(np.ones(10)/1000.,size=1)
[[ 2.63435230e-115 4.31961290e-209 1.41369771e-212 1.42417285e-188
0.00000000e+000 5.79841280e-143 0.00000000e+000 9.85329725e-005
9.99901467e-001 8.37460207e-246]]
>>> print np.random.dirichlet(np.ones(10)*1000.,size=1)
[[ 0.09967689 0.10151585 0.10077575 0.09875282 0.09935606 0.10093678
0.09517132 0.09891358 0.10206595 0.10283501]]
Depending on the main parameter the Dirichlet distribution will either give vectors where all the values are close to 1./N where N is the length of the vector, or give vectors where most of the values of the vectors will be ~0 , and there will be a single 1, or give something in between those possibilities.
EDIT (5 years after the original answer): Another useful fact about the Dirichlet distribution is that you naturally get it, if you generate a Gamma-distributed set of random variables and then divide them by their sum.
The best way to do this is to simply make a list of as many numbers as you wish, then divide them all by the sum. They are totally random this way.
r = [ran.random() for i in range(1,100)]
s = sum(r)
r = [ i/s for i in r ]
or, as suggested by @TomKealy, keep the sum and creation in one loop:
rs = []
s = 0
for i in range(100):
r = ran.random()
s += r
rs.append(r)
For the fastest performance, use numpy
:
import numpy as np
a = np.random.random(100)
a /= a.sum()
And you can give the random numbers any distribution you want, for a probability distribution:
a = np.random.normal(size=100)
a /= a.sum()
---- Timing ----
In [52]: %%timeit
...: r = [ran.random() for i in range(1,100)]
...: s = sum(r)
...: r = [ i/s for i in r ]
....:
1000 loops, best of 3: 231 µs per loop
In [53]: %%timeit
....: rs = []
....: s = 0
....: for i in range(100):
....: r = ran.random()
....: s += r
....: rs.append(r)
....:
10000 loops, best of 3: 39.9 µs per loop
In [54]: %%timeit
....: a = np.random.random(100)
....: a /= a.sum()
....:
10000 loops, best of 3: 21.8 µs per loop
Dividing each number by the total may not give you the distribution you want. For example, with two numbers, the pair x,y = random.random(), random.random() picks a point uniformly on the square 0<=x<1, 0<=y<1. Dividing by the sum "projects" that point (x,y) onto the line x+y=1 along the line from (x,y) to the origin. Points near (0.5,0.5) will be much more likely than points near (0.1,0.9).
For two variables, then, x = random.random(), y=1-x gives a uniform distribution along the geometrical line segment.
With 3 variables, you are picking a random point in a cube and projecting (radially, through the origin), but points near the center of the triangle will be more likely than points near the vertices. The resulting points are on a triangle in the x+y+z plane. If you need unbiased choice of points in that triangle, scaling is no good.
The problem gets complicated in n-dimensions, but you can get a low-precision (but high accuracy, for all you laboratory science fans!) estimate by picking uniformly from the set of all n-tuples of non-negative integers adding up to N, and then dividing each of them by N.
I recently came up with an algorithm to do that for modest-sized n, N. It should work for n=100 and N = 1,000,000 to give you 6-digit randoms. See my answer at:
Create constrained random numbers?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With