I would like to generate n random numbers e.g., n=200
, where the range of possible values is between 2 and 40 with a mean of 12 and median is 6.5.
I searched everywhere and i could not find a solution for this. I tried the following script by it works for small numbers such as 20, for big numbers it takes ages and result is returned.
n=200
x = np.random.randint(0,1,size=n) # initalisation only
while True:
if x.mean() == 12 and np.median(x) == 6.5:
break
else:
x=np.random.randint(2,40,size=n)
Could anyone help me by improving this to get a quick result even when n=5000 or so?
Use a random. randrange() function to get a random integer number from the given exclusive range by specifying the increment. For example, random. randrange(0, 10, 2) will return any random number between 0 and 20 (like 0, 2, 4, 6, 8).
I need to know how to generate 1000 random numbers between 500 and 600 that has a mean = 550 and standard deviation = 30 in python. import pylab import random xrandn = pylab. zeros(1000,float) for j in range(500,601): xrandn[j] = pylab.
Use the formula "=NORMINV(RAND(),B2,C2)", where the RAND() function creates your probability, B2 provides your mean and C2 references your standard deviation. You can change B2 and C2 to reference different cells or enter the values into the formula itself.
Here, you want a median value lesser than the mean value. That means that a uniform distribution is not appropriate: you want many little values and fewer great ones.
Specifically, you want as many value lesser or equal to 6 as the number of values greater or equal to 7.
A simple way to ensure that the median will be 6.5 is to have the same number of values in the range [ 2 - 6 ] as in [ 7 - 40 ]. If you choosed uniform distributions in both ranges, you would have a theorical mean of 13.75, which is not that far from the required 12.
A slight variation on the weights can make the theorical mean even closer: if we use [ 5, 4, 3, 2, 1, 1, ..., 1 ] for the relative weights of the random.choices
of the [ 7, 8, ..., 40 ] range, we find a theorical mean of 19.98 for that range, which is close enough to the expected 20.
Example code:
>>> pop1 = list(range(2, 7))
>>> pop2 = list(range(7, 41))
>>> w2 = [ 5, 4, 3, 2 ] + ( [1] * 30)
>>> r1 = random.choices(pop1, k=2500)
>>> r2 = random.choices(pop2, w2, k=2500)
>>> r = r1 + r2
>>> random.shuffle(r)
>>> statistics.mean(r)
12.0358
>>> statistics.median(r)
6.5
>>>
So we now have a 5000 values distribution that has a median of exactly 6.5 and a mean value of 12.0358 (this one is random, and another test will give a slightly different value). If we want an exact mean of 12, we just have to tweak some values. Here sum(r)
is 60179 when it should be 60000, so we have to decrease 175 values which were neither 2 (would go out of range) not 7 (would change the median).
In the end, a possible generator function could be:
def gendistrib(n):
if n % 2 != 0 :
raise ValueError("gendistrib needs an even parameter")
n2 = n//2 # n / 2 in Python 2
pop1 = list(range(2, 7)) # lower range
pop2 = list(range(7, 41)) # upper range
w2 = [ 5, 4, 3, 2 ] + ( [1] * 30) # weights for upper range
r1 = random.choices(pop1, k=n2) # lower part of the distrib.
r2 = random.choices(pop2, w2, k=n2) # upper part
r = r1 + r2
random.shuffle(r) # randomize order
# time to force an exact mean
tot = sum(r)
expected = 12 * n
if tot > expected: # too high: decrease some values
for i, val in enumerate(r):
if val != 2 and val != 7:
r[i] = val - 1
tot -= 1
if tot == expected:
random.shuffle(r) # shuffle again the decreased values
break
elif tot < expected: # too low: increase some values
for i, val in enumerate(r):
if val != 6 and val != 40:
r[i] = val + 1
tot += 1
if tot == expected:
random.shuffle(r) # shuffle again the increased values
break
return r
It is really fast: I could timeit gendistrib(10000)
at less than 0.02 seconds. But it should not be used for small distributions (less than 1000)
One way to get a result really close to what you want is to generate two separate random ranges with length 100 that satisfies your median constraints and includes all the desire range of numbers. Then by concatenating the arrays the mean will be around 12 but not quite equal to 12. But since it's just mean that you're dealing with you can simply generate your expected result by tweaking one of these arrays.
In [162]: arr1 = np.random.randint(2, 7, 100)
In [163]: arr2 = np.random.randint(7, 40, 100)
In [164]: np.mean(np.concatenate((arr1, arr2)))
Out[164]: 12.22
In [166]: np.median(np.concatenate((arr1, arr2)))
Out[166]: 6.5
Following is a vectorized and very much optimized solution against any other solution that uses for loops or python-level code by constraining the random sequence creation:
import numpy as np
import math
def gen_random():
arr1 = np.random.randint(2, 7, 99)
arr2 = np.random.randint(7, 40, 99)
mid = [6, 7]
i = ((np.sum(arr1 + arr2) + 13) - (12 * 200)) / 40
decm, intg = math.modf(i)
args = np.argsort(arr2)
arr2[args[-41:-1]] -= int(intg)
arr2[args[-1]] -= int(np.round(decm * 40))
return np.concatenate((arr1, mid, arr2))
Demo:
arr = gen_random()
print(np.median(arr))
print(arr.mean())
6.5
12.0
The logic behind the function:
In order for us to have a random array with that criteria we can concatenate 3 arrays together arr1
, mid
and arr2
. arr1
and arr2
each hold 99 items and the mid
holds 2 items 6 and 7 so that make the final result to give as 6.5 as the median. Now we an create two random arrays each with length 99. All we need to do to make the result to have a 12 mean is to find the difference between the current sum and 12 * 200
and subtract the result from our N largest numbers which in this case we can choose them from arr2
and use N=50
.
Edit:
If it's not a problem to have float numbers in your result you can actually shorten the function as following:
import numpy as np
import math
def gen_random():
arr1 = np.random.randint(2, 7, 99).astype(np.float)
arr2 = np.random.randint(7, 40, 99).astype(np.float)
mid = [6, 7]
i = ((np.sum(arr1 + arr2) + 13) - (12 * 200)) / 40
args = np.argsort(arr2)
arr2[args[-40:]] -= i
return np.concatenate((arr1, mid, arr2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With