I have a list:
s = [0.995537725, 0.994532199, 0.996027983, 0.999891383, 1.004754272, 1.003870012, 0.999888944, 0.994438078, 0.992548715, 0.998344545, 1.004504764, 1.00883411]
where I calculated its standard deviation in Excel, I got the answer: 0.005106477
, the function I used was: =STDEV(C5:N5)
Then I do the same calculation using numpy.std
as:
import numpy as np
print np.std(s)
However, I got the answer: 0.0048890791894
I even wrote up my own std function:
def std(input_list):
count = len(input_list)
mean = float(sum(input_list)) / float(count)
overall = 0.0
for i in input_list:
overall = overall + (i - mean) * (i - mean)
return math.sqrt(overall / count)
and my own function gives the same result as numpy.
So I am wondering is there such a difference? Or it just I made some mistake?
There's a difference: Excel's STDEV
calculates the sample standard deviation, while NumPy's std
calculates the population standard deviation by default (it is behaving like Excel's STDEVP
).
To make NumPy's std
function behave like Excel's STDEV
, pass in the value ddof=1
:
>>> np.std(s, ddof=1)
0.0051064766704396617
This calculates the standard deviation of s
using the sample variance (i.e. dividing by n-1
rather than n
.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With