I can't quite figure out how to write function to accomplish a grouped percentile. I have all teams from years 1985-2012 in a data frame; the first 10 are shown below: it's currently sorted by year. I was looking to give a percentile for LgRnk
grouped by Year
. So for instance, 23 LgRank (worst team) for 1985 would be a 100 percentile and a 1 LgRank (best team) for 1985 would be a 1 percentile. 30 LgRank (worst team) for 2010 would be 100 percentile, etc. It needs to be grouped by year b/c of the differing number of LgRnk
s.
Team WLPer Year LgRnk W L
19 Sacramento Kings 0.378 1985 18 31 51
0 Atlanta Hawks 0.415 1985 17 34 48
17 Phoenix Suns 0.439 1985 16 36 46
4 Cleveland Cavaliers 0.439 1985 15 36 46
13 Milwaukee Bucks 0.720 1985 3 59 23
3 Chicago Bulls 0.463 1985 14 38 44
16 Philadelphia 76ers 0.707 1985 4 58 24
22 Washington Wizards 0.488 1985 13 40 42
20 San Antonio Spurs 0.500 1985 12 41 41
21 Utah Jazz 0.500 1985 11 41 41
I've tried creating a function using: scipy.stats.percentileofscore
and I can't quite get it.
Let us see how to find the percentile rank of a column in a Pandas DataFrame. We will use the rank() function with the argument pct = True to find the percentile rank.
As a result, the formula for calculating the JEE Main rank through percentile for January is as follows: JEE Main probable rank = (100- NTA percentile score) X 869010 /100. If the NTA percentile score is 90.70, JEE's Main rank will be (100-90.70 ) X 869010/100 = 80818.
You can do an apply on the LgRnk column:
# just for me to normalize this, so my numbers will go from 0 to 1 in this example
In [11]: df['LgRnk'] = g.LgRnk.rank()
In [12]: g = df.groupby('Year')
In [13]: g.LgRnk.apply(lambda x: x / len(x))
Out[13]:
19 1.0
0 0.9
17 0.8
4 0.7
13 0.1
3 0.6
16 0.2
22 0.5
20 0.4
21 0.3
Name: 1985, dtype: float64
The Series groupby rank (which just applies Series.rank
) take a pct argument to do just this:
In [21]: g.LgRnk.rank(pct=True)
Out[21]:
19 1.0
0 0.9
17 0.8
4 0.7
13 0.1
3 0.6
16 0.2
22 0.5
20 0.4
21 0.3
Name: 1985, dtype: float64
and directly on the WLPer
column (although this is slightly different due to draws):
In [22]: g.WLPer.rank(pct=True, ascending=False)
Out[22]:
19 1.00
0 0.90
17 0.75
4 0.75
13 0.10
3 0.60
16 0.20
22 0.50
20 0.35
21 0.35
Name: 1985, dtype: float64
Note: I've changed the numbers on the first line, so you'll get different scores on your complete frame.
You need to calculate rank within the group before normalizing within the group. The other answers will result in percentiles over 100%. I suggest:
df['percentile'] = df.groupby('year')['LgRnk'].rank(pct=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With