Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas describe 0.18.0 vs pandas describe 0.17.0

Tags:

On one environment, I have pandas version 0.17.0 with numpy version 1.10.1. On another environment, I have pandas version 0.18.1 with numpy version 1.10.4.

I run this piece of code

from pandas import Series
import numpy as np
Series([1,2,3,4,5,np.NaN]).describe()

With pandas version 0.17.0 I get this output:

count    5.000000 
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64

with pandas version 0.18.1 i get this output:

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%           NaN
50%           NaN
75%           NaN
max      5.000000
dtype: float64

what gives?

like image 915
cyth217 Avatar asked May 09 '16 18:05

cyth217


People also ask

How would you describe a panda's data frame?

Pandas DataFrame describe() MethodThe describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.

How is pandas value calculated?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.

How do you find the range of a panda in Python?

In pandas, we can determine Period Range with Frequency with the help of period_range(). pandas. period_range() is one of the general functions in Pandas which is used to return a fixed frequency PeriodIndex, with day (calendar) as the default frequency.

Why size is used in pandas?

The size property is used to get an int representing the number of elements in this object. Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.


1 Answers

Your issue is that Series.describe() uses Series.quantile(), and there is currently a reported bug (#13098) in Pandas 0.18.1 where Series.quantile() will not return percentiles when the series contains nan.

Bug demo from #13098:

>>> import pandas as pd
>>> import numpy
>>> s = pd.Series([1, 2, 3, 4, numpy.nan])
>>> s.quantile(0.5)
nan

If you look at pull #12752 it looks like notnull used to be used to remove the nan values before calculating percentiles, however it was removed.


Update:

This issue now appears to be closed with this commit after which Series.quantile() once again handles nan (2016/05/12).

like image 77
miradulo Avatar answered Oct 03 '22 17:10

miradulo