Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 3.1 - Creating normal distribution

Tags:

python

I have scipy and numpy, Python v3.1

I need to create a 1D array of length 3million, using random numbers between (and including) 100-60,000. It has to fit a normal distribution.

Using 'a = numpy.random.standard_normal(3000000)', I get a normal distribution for that required length; not sure how to achieve the required range.

like image 874
jimy Avatar asked Jan 15 '11 03:01

jimy


2 Answers

A standard normal distribution has mean 0 and standard deviation 1. What I understand from your requirements is that you need a ((60000-100)/2, (60000-100)/2) one. Take each value from standard_normal() result, multiply it by the new variance, and add the new mean.

I haven't used NumPy, but a quick search of the docs says that you can achieve what you want directly bu using numpy.random.normal()

One last tidbit: normal distributions are not bounded. That means there isn't a value with probability zero. Your requirements should be in terms of means and variances (or standard deviations), and not of limits.

like image 110
Apalala Avatar answered Oct 08 '22 07:10

Apalala


If you want a truly random normal distribution, you can't guarentee how far the numbers will spread. You can reduce the probability of outliers, however, by specifying the standard deviation

>>> n = 3000000
>>> sigma5 = 1.0 / 1744278
>>> n * sigma5
1.7199093263803131  # Expect one values in 3 mil outside range at 5 stdev.
>>> sigma6 = 1.0 / 1 / 506800000
>>> sigma6 = 1.0 / 506800000
>>> n * sigma6
0.0059194948697711127 # Expect 0.005 values in 3 mil outside range at 6 stdev.
>>> sigma7 = 1.0 / 390600000000
>>> n * sigma7
7.6804915514592934e-06

Therefore, in this case, ensuring that the standard deviation is only 1/6 or 1/7 of half the range will give you reasonable confidence that your data will not exceed the range.

>>> range = 60000 - 100
>>> spread = (range / 2) / 6 # Anything outside of the range will be six std. dev. from the mean
>>> mean = (60000 + 100) / 2
>>> a = numpy.random.normal(loc = mean, scale = spread, size = n) 
>>> min(a)
6320.0238199673404
>>> max(a)
55044.015566089176

Of course, you can still can values that fall outside the range here

like image 43
fmark Avatar answered Oct 08 '22 08:10

fmark