Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas quantiles in series containing infinity?

Tags:

python

pandas

I have the following dataframe:

   calc_value
0         NaN
1    0.000000
2    0.100000
3    0.500000
4    2.333333
5         inf

Now I want to calculate some quantiles:

print df.quantile(.1)['calc_value']
print df.quantile(.25)['calc_value']
print df.quantile(.5)['calc_value']
print df.quantile(.75)['calc_value']
print df.quantile(.9)['calc_value']

But this returns:

0.04
0.1
0.5
nan
inf

I don't understand why the 75th quantile works this way. Shouldn't it be infinity?

like image 541
Richard Avatar asked Apr 12 '16 09:04

Richard


People also ask

How do pandas handle infinite values?

By using replace() & dropna() methods you can remove infinite values from rows & columns in pandas DataFrame. Infinite values are represented in NumPy as np. inf & -np. inf for negative values.

How do you get Quantiles in pandas?

Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.


1 Answers

I think it may be a bug in numpy:

np.percentile([0,1,np.inf], 50)
Out[63]: nan

while

np.median([0, 1, np.inf])
Out[65]: 1.0

Instead of simply taking a value at index 1, it takes values at indices 1 and 2 with weights 1 and 0. So it results in 0 * inf.


In your case the result should be 2.33 (try with, for example, df.iloc[5,0] = 1e10).

like image 109
ptrj Avatar answered Oct 20 '22 14:10

ptrj