I have the following dataframe:
calc_value
0 NaN
1 0.000000
2 0.100000
3 0.500000
4 2.333333
5 inf
Now I want to calculate some quantiles:
print df.quantile(.1)['calc_value']
print df.quantile(.25)['calc_value']
print df.quantile(.5)['calc_value']
print df.quantile(.75)['calc_value']
print df.quantile(.9)['calc_value']
But this returns:
0.04
0.1
0.5
nan
inf
I don't understand why the 75th quantile works this way. Shouldn't it be infinity?
By using replace() & dropna() methods you can remove infinite values from rows & columns in pandas DataFrame. Infinite values are represented in NumPy as np. inf & -np. inf for negative values.
Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.
I think it may be a bug in numpy:
np.percentile([0,1,np.inf], 50)
Out[63]: nan
while
np.median([0, 1, np.inf])
Out[65]: 1.0
Instead of simply taking a value at index 1, it takes values at indices 1 and 2 with weights 1 and 0. So it results in 0 * inf
.
In your case the result should be 2.33 (try with, for example, df.iloc[5,0] = 1e10
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With