Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find the index for a quantile

Tags:

python

pandas

I'm using a pandas series and I want to find the index value that represents the quantile.

If I have:

np.random.seed(8)
s = pd.Series(np.random.rand(6), ['a', 'b', 'c', 'd', 'e', 'f'])
s

a    0.873429
b    0.968541
c    0.869195
d    0.530856
e    0.232728
f    0.011399
dtype: float64

And do

s.quantile(.5)

I get

0.70002511588475946

What I want to know is what is the index value of s that represents the point just before that quantile value. In this case I know the index value should be d.

like image 662
Brian Avatar asked Jul 16 '16 15:07

Brian


People also ask

How is quantile calculated in pandas?

Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.

How does Python find Quantiles?

In Python, the numpy. quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile.


2 Answers

If you set the interpolation argument to 'lower', 'higher', or 'nearest' then the problem can be solved a bit more simply as:

s[s == s.quantile(.5, interpolation='lower')]

I'd guess this method is a fair bit faster than piRSquared's solution as well

like image 186
JoeTheShmoe Avatar answered Sep 25 '22 13:09

JoeTheShmoe


Use sort_values, reverse the order, find all that are less than or equal to the quantile calculated, then find the idxmax.

(s.sort_values()[::-1] <= s.quantile(.5)).idxmax()

Or:

(s.sort_values(ascending=False) <= s.quantile(.5)).idxmax()

We can functionalize it:

def idxquantile(s, q=0.5, *args, **kwargs):
    qv = s.quantile(q, *args, **kwargs)
    return (s.sort_values()[::-1] <= qv).idxmax()

idxquantile(s)
like image 30
piRSquared Avatar answered Sep 26 '22 13:09

piRSquared