I'm using a pandas series and I want to find the index value that represents the quantile.
If I have:
np.random.seed(8)
s = pd.Series(np.random.rand(6), ['a', 'b', 'c', 'd', 'e', 'f'])
s
a 0.873429
b 0.968541
c 0.869195
d 0.530856
e 0.232728
f 0.011399
dtype: float64
And do
s.quantile(.5)
I get
0.70002511588475946
What I want to know is what is the index value of s
that represents the point just before that quantile value. In this case I know the index value should be d
.
Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.
In Python, the numpy. quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile.
If you set the interpolation
argument to 'lower'
, 'higher'
, or 'nearest'
then the problem can be solved a bit more simply as:
s[s == s.quantile(.5, interpolation='lower')]
I'd guess this method is a fair bit faster than piRSquared's solution as well
Use sort_values
, reverse the order, find all that are less than or equal to the quantile calculated, then find the idxmax
.
(s.sort_values()[::-1] <= s.quantile(.5)).idxmax()
Or:
(s.sort_values(ascending=False) <= s.quantile(.5)).idxmax()
We can functionalize it:
def idxquantile(s, q=0.5, *args, **kwargs):
qv = s.quantile(q, *args, **kwargs)
return (s.sort_values()[::-1] <= qv).idxmax()
idxquantile(s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With