Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy mean AND variance from single function?

Tags:

Using Numpy/Python, is it possible to return the mean AND variance from a single function call?

I know that I can do them separately, but the mean is required to calculate the sample standard deviation. So if I use separate functions to get the mean and variance I am adding unnecesary overhead.

I have tried looking at the numpy docs here (http://docs.scipy.org/doc/numpy/reference/routines.statistics.html), but with no success.

like image 730
Ginger Avatar asked Oct 15 '13 21:10

Ginger


People also ask

How do you find the variance in Numpy?

In NumPy, the variance can be calculated for a vector or a matrix using the var() function. By default, the var() function calculates the population variance. To calculate the sample variance, you must set the ddof argument to the value 1.

How do you find the mean and standard deviation of a Numpy array?

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(x)) , where x = abs(a - a. mean())**2 . The average squared deviation is typically calculated as x. sum() / N , where N = len(x) .

How do you find the mean and variance in Python?

Coding a stdev() Function in Python Our stdev() function takes some data and returns the population standard deviation. To do that, we rely on our previous variance() function to calculate the variance and then we use math. sqrt() to take the square root of the variance.


1 Answers

You can't pass a known mean to np.std or np.var, you'll have to wait for the new standard library statistics module, but in the meantime you can save a little time by using the formula:

In [329]: a = np.random.rand(1000)  In [330]: %%timeit    .....: a.mean()    .....: a.var()    .....:  10000 loops, best of 3: 80.6 µs per loop  In [331]: %%timeit    .....: m = a.mean()    .....: np.mean((a-m)**2)    .....:  10000 loops, best of 3: 60.9 µs per loop  In [332]: m = a.mean()  In [333]: a.var() Out[333]: 0.078365856465916137  In [334]: np.mean((a-m)**2) Out[334]: 0.078365856465916137 

If you really are trying to speed things up, try np.dot to do the squaring and summing (since that's what a dot-product is):

In [335]: np.dot(a-m,a-m)/a.size Out[335]: 0.078365856465916137  In [336]: %%timeit    .....: m = a.mean()    .....: c = a-m    .....: np.dot(c,c)/a.size    .....:  10000 loops, best of 3: 38.2 µs per loop 
like image 55
askewchan Avatar answered Sep 20 '22 16:09

askewchan