Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Numpy standard deviation error

Tags:

python

numpy

This is a simple test

import numpy as np data = np.array([-1,0,1]) print data.std()  >> 0.816496580928 

I don't understand how this result been generated? Obviously:

( (1^0.5 + 1^0.5 + 0^0.5)/(3-1) )^0.5 = 1 

and in matlab it gives me std([-1,0,1]) = 1. Could you help me get understand how numpy.std() works?

like image 667
MacSanhe Avatar asked Jun 05 '14 18:06

MacSanhe


People also ask

How does Numpy calculate standard error?

Standard Error of Mean Using Numpy But there is a function called std() that calculates the standard deviation. So, to calculate the SEM with NumPy, calculate the standard deviation and divide it by the square root of the data size.

How does Python Numpy calculate standard deviation?

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(x)) , where x = abs(a - a. mean())**2 . The average squared deviation is typically calculated as x. sum() / N , where N = len(x) .

What is the STD in Numpy in Python?

NumPy has various functions to perform calculations on the arrays of numeric data. One of these is the std() function. The numpy. std() function finds the standard deviation of a given NumPy array along the specified axis.


2 Answers

The crux of this problem is that you need to divide by N (3), not N-1 (2). As Iarsmans pointed out, numpy will use the population variance, not the sample variance.

So the real answer is sqrt(2/3) which is exactly that: 0.8164965...

If you happen to be trying to deliberately use a different value (than the default of 0) for the degrees of freedom, use the keyword argument ddofwith a positive value other than 0:

np.std(data, ddof=1) 

... but doing so here would reintroduce your original problem as numpy will divide by N - ddof.

like image 75
BlackVegetable Avatar answered Oct 08 '22 19:10

BlackVegetable


It is worth reading the help page for the function/method before suggesting it is incorrect. The method does exactly what the doc-string says it should be doing, divides by 3, because By default ddofis zero.:

In [3]: numpy.std?  String form: <function std at 0x104222398> File:        /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py Definition:  numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False) Docstring: Compute the standard deviation along the specified axis.  ...  ddof : int, optional     Means Delta Degrees of Freedom.  The divisor used in calculations     is ``N - ddof``, where ``N`` represents the number of elements.     By default `ddof` is zero. 
like image 26
Oleg Sklyar Avatar answered Oct 08 '22 19:10

Oleg Sklyar