I try to convert matlab code to numpy and figured out that numpy has a different result with the std function. in matlab <pre class="prettyprint"><code>std([1,3,4,6]) ans = 2.0817 </code></pre> in numpy <pre class="prettyprint"><code>np.std([1,3,4,6]) 1.8027756377319946 </code></pre> Is this normal? And how should I handle this?

The standard deviation is the square root of the variance. The variance of a random variable <code>X</code> is defined as <img src="https://i.stack.imgur.com/Gie1h.png" alt="definition of variance"> An estimator for the variance would therefore be <img src="https://i.stack.imgur.com/aCNlB.png" alt="biased estimator"> where <img src="https://i.stack.imgur.com/yZsYN.png" alt="sample mean"> denotes the sample mean. For randomly selected <img src="https://i.stack.imgur.com/wVCga.png" alt="xi">, it can be shown that this estimator does not converge to the real variance, but to <img src="https://i.stack.imgur.com/5JJDf.png" alt="unbiased estimator"> If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator <img src="https://i.stack.imgur.com/NFF1W.png" alt="unbiased estimator"> which will converge to <img src="https://i.stack.imgur.com/gMDPS.png" alt="sigma squared">. The correction term <img src="https://i.stack.imgur.com/nNaCO.png" alt="n-1"> is also called Bessel's correction. Now by default, MATLABs <code>std</code> calculates the unbiased estimator with the correction term <code>n-1</code>. NumPy however (as @ajcr explained) calculates the biased estimator with no correction term by default. The parameter <code>ddof</code> allows to set any correction term <code>n-ddof</code>. By setting it to 1 you get the same result as in MATLAB. Similarly, MATLAB allows to add a second parameter <code>w</code>, which specifies the "weighing scheme". The default, <code>w=0</code>, results in the correction term <code>n-1</code> (unbiased estimator), while for <code>w=1</code>, only n is used as correction term (biased estimator).

Why does numpy std() give a different result to matlab std()?

Tags:

python

numpy

matlab

standard-deviation

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.

in matlab

std([1,3,4,6]) ans =  2.0817

in numpy

np.std([1,3,4,6]) 1.8027756377319946

Is this normal? And how should I handle this?

957

asked Dec 22 '14 09:12

gustavgans

2 Answers

The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:

>>> np.std([1,3,4,6], ddof=1) 2.0816659994661326

To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.

But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.

Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.

The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.

The nice answer by @hbaderts gives further mathematical details.

113

answered Sep 21 '22 02:09

Alex Riley

The standard deviation is the square root of the variance. The variance of a random variable X is defined as

definition of variance

An estimator for the variance would therefore be

biased estimator

where denotes the sample mean. For randomly selected , it can be shown that this estimator does not converge to the real variance, but to

unbiased estimator

If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator

unbiased estimator

which will converge to sigma squared . The correction term n-1 is also called Bessel's correction.

Now by default, MATLABs std calculates the unbiased estimator with the correction term n-1. NumPy however (as @ajcr explained) calculates the biased estimator with no correction term by default. The parameter ddof allows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.

Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1 (unbiased estimator), while for w=1, only n is used as correction term (biased estimator).

answered Sep 22 '22 02:09

hbaderts

Related questions
                            
                                models.py getting huge, what is the best way to break it up?
                            
                                How to implement band-pass Butterworth filter with Scipy.signal.butter
                            
                                Recursion using yield
                            
                                Split string into list in jinja?
                            
                                converting epoch time with milliseconds to datetime
                            
                                Reading Excel File using Python, how do I get the values of a specific column with indicated column name?
                            
                                Can you explain closures (as they relate to Python)?
                            
                                Base language of Python
                            
                                Good examples of python-memcache (memcached) being used in Python? [closed]
                            
                                Accessing a value in a tuple that is in a list
                            
                                How to force a list to a fixed size?
                            
                                Is it possible to hide the browser in Selenium RC?
                            
                                Reversing a list using slice notation
                            
                                Interactive large plot with ~20 million sample points and gigabytes of data
                            
                                Mean Squared Error in Numpy?
                            
                                How do you remove the column name row when exporting a pandas DataFrame?
                            
                                What is the purpose of the return statement?
                            
                                Efficient date range overlap calculation in python?
                            
                                How to print Y axis label horizontally in a matplotlib / pylab chart?
                            
                                How do I merge dictionaries together in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With