Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: why pandas.Series.std() is quite different from numpy.std()

I got two snippets code as follows.

import numpy
numpy.std([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346])
0

and

import pandas as pd
pd.Series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).std(ddof=0)
10.119288512538814

That's a huge difference.

May I ask why?

like image 216
Tony Avatar asked Jul 02 '15 06:07

Tony


People also ask

What is a Pandas series How is it different from a NumPy array?

Series as generalized NumPy array The essential difference is the presence of the index: while the Numpy Array has an implicitly defined integer index used to access the values, the Pandas Series has an explicitly defined index associated with the values.

Is Pandas Series faster than NumPy?

The indexing of pandas series is significantly slower than the indexing of NumPy arrays. The indexing of NumPy arrays is much faster than the indexing of Pandas arrays.

Is NumPy array faster than Pandas series for the same size?

Numpy has a better performance when number of rows is 50K or less. Indexing of the pandas series is very slow as compared to numpy arrays. Indexing of numpy Arrays is very fast.

What does STD mean in Pandas?

In pandas, the std() function is used to find the standard Deviation of the series. The mean can be simply defined as the average of numbers. In pandas, the mean() function is used to find the mean of the series.


1 Answers

This issue is indeed already under discussion (link); problem seems to be the algorithm for calculating the standard deviation which is used by pandas since it is not as numerically stable as the one used by numpy.

An easy workaround would be to apply .values to the series first and then apply std to these values; in this case numpy's std is used:

pd.Series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).values.std()

which gives you the expected value 0.

like image 179
Cleb Avatar answered Oct 14 '22 08:10

Cleb