The standard deviation differs between pandas and numpy. Why and which one is the correct one? (the relative difference is 3.5% which should not come from rounding, this is high in my opinion).
Example
import numpy as np import pandas as pd from StringIO import StringIO a='''0.057411 0.024367 0.021247 -0.001809 -0.010874 -0.035845 0.001663 0.043282 0.004433 -0.007242 0.029294 0.023699 0.049654 0.034422 -0.005380''' df = pd.read_csv(StringIO(a.strip()), delim_whitespace=True, header=None) df.std()==np.std(df) # False df.std() # 0.025801 np.std(df) # 0.024926 (0.024926 - 0.025801) / 0.024926 # 3.5% relative difference
I use these versions:
pandas '0.14.0' numpy '1.8.1'
Numpy is memory efficient. Pandas has a better performance when a number of rows is 500K or more. Numpy has a better performance when number of rows is 50K or less. Indexing of the pandas series is very slow as compared to numpy arrays.
std() in Python. numpy. std(arr, axis = None) : Compute the standard deviation of the given data (array elements) along the specified axis(if any).. Standard Deviation (SD) is measured as the spread of data distribution in the given data set.
What is Pandas? Similar to NumPy, Pandas is one of the most widely used python libraries in data science. It provides high-performance, easy to use structures and data analysis tools. Unlike NumPy library which provides objects for multi-dimensional arrays, Pandas provides in-memory 2d table object called Dataframe.
The essential difference is the presence of the index: while the Numpy Array has an implicitly defined integer index used to access the values, the Pandas Series has an explicitly defined index associated with the values.
In a nutshell, neither is "incorrect". Pandas uses the unbiased estimator (N-1
in the denominator), whereas Numpy by default does not.
To make them behave the same, pass ddof=1
to numpy.std()
.
For further discussion, see
For pandas
to performed the same as numpy
, you can pass in the ddof=0
parameter, so df.std(ddof=0)
.
This short video explains quite well why n-1
might be preferred for samples. https://www.youtube.com/watch?v=Cn0skMJ2F3c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With