Different std in pandas vs numpy

Tags:

The standard deviation differs between pandas and numpy. Why and which one is the correct one? (the relative difference is 3.5% which should not come from rounding, this is high in my opinion).

Example

import numpy as np import pandas as pd from StringIO import StringIO  a='''0.057411 0.024367  0.021247 -0.001809 -0.010874 -0.035845 0.001663 0.043282 0.004433 -0.007242 0.029294 0.023699 0.049654 0.034422 -0.005380'''   df = pd.read_csv(StringIO(a.strip()), delim_whitespace=True, header=None)  df.std()==np.std(df) # False df.std() # 0.025801 np.std(df) # 0.024926  (0.024926 - 0.025801) / 0.024926 # 3.5% relative difference

I use these versions:

pandas '0.14.0' numpy '1.8.1'

715

asked Jul 27 '14 18:07

Mannaggia

2 Answers

In a nutshell, neither is "incorrect". Pandas uses the unbiased estimator (N-1 in the denominator), whereas Numpy by default does not.

To make them behave the same, pass ddof=1 to numpy.std().

For further discussion, see

Can someone explain biased/unbiased population/sample standard deviation?
Population variance and sample variance.
Why divide by n-1?

128

answered Sep 23 '22 11:09

NPE

For pandas to performed the same as numpy, you can pass in the ddof=0 parameter, so df.std(ddof=0).

This short video explains quite well why n-1 might be preferred for samples. https://www.youtube.com/watch?v=Cn0skMJ2F3c

answered Sep 21 '22 11:09

Xuan

Related questions
                            
                                How do I convert a string into an f-string?
                            
                                Amazon API library for Python? [closed]
                            
                                How do I use timezones with a datetime object in python?
                            
                                Restricting values of command line options
                            
                                Use Distance Matrix in scipy.cluster.hierarchy.linkage()?
                            
                                Purpose of python antigravity module
                            
                                Sql Alchemy QueuePool limit overflow
                            
                                iter() not working with datetime.now()
                            
                                Does Numpy automatically detect and use GPU?
                            
                                Is there a cross-platform way of getting information from Python's OSError?
                            
                                Start with pyglet or pygame? [closed]
                            
                                Where is Python language used? [closed]
                            
                                Performance effect of using print statements in Python script
                            
                                Prevent Vim from indenting line when typing a colon (:) in Python
                            
                                Getting the indices of several elements in a NumPy array at once
                            
                                Numpy - module has no attribute 'arrange' [closed]
                            
                                Python Multiple Assignment Statements In One Line
                            
                                Is there something like 'autotest' for Python unittests?
                            
                                When running a python script in IDLE, is there a way to pass in command line arguments (args)?
                            
                                How to write a custom `.assertFoo()` method in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Different std in pandas vs numpy

Tags:

python

pandas

numpy

Mannaggia

People also ask

2 Answers

NPE

Xuan

Recent Activity

Donate For Us