percentile rank in pandas in groups

Tags:

I can't quite figure out how to write function to accomplish a grouped percentile. I have all teams from years 1985-2012 in a data frame; the first 10 are shown below: it's currently sorted by year. I was looking to give a percentile for LgRnk grouped by Year. So for instance, 23 LgRank (worst team) for 1985 would be a 100 percentile and a 1 LgRank (best team) for 1985 would be a 1 percentile. 30 LgRank (worst team) for 2010 would be 100 percentile, etc. It needs to be grouped by year b/c of the differing number of LgRnks.

    Team                WLPer   Year LgRnk   W  L
19  Sacramento Kings    0.378   1985    18  31  51
0   Atlanta Hawks       0.415   1985    17  34  48
17  Phoenix Suns        0.439   1985    16  36  46
4   Cleveland Cavaliers 0.439   1985    15  36  46
13  Milwaukee Bucks     0.720   1985    3   59  23
3   Chicago Bulls       0.463   1985    14  38  44
16  Philadelphia 76ers  0.707   1985    4   58  24
22  Washington Wizards  0.488   1985    13  40  42
20  San Antonio Spurs   0.500   1985    12  41  41
21  Utah Jazz           0.500   1985    11  41  41

I've tried creating a function using: scipy.stats.percentileofscore and I can't quite get it.

981

asked Mar 12 '14 00:03

itjcms18

2 Answers

You can do an apply on the LgRnk column:

# just for me to normalize this, so my numbers will go from 0 to 1 in this example
In [11]: df['LgRnk'] = g.LgRnk.rank()

In [12]: g = df.groupby('Year')

In [13]: g.LgRnk.apply(lambda x: x / len(x))
Out[13]:
19    1.0
0     0.9
17    0.8
4     0.7
13    0.1
3     0.6
16    0.2
22    0.5
20    0.4
21    0.3
Name: 1985, dtype: float64

The Series groupby rank (which just applies Series.rank) take a pct argument to do just this:

In [21]: g.LgRnk.rank(pct=True)
Out[21]:
19    1.0
0     0.9
17    0.8
4     0.7
13    0.1
3     0.6
16    0.2
22    0.5
20    0.4
21    0.3
Name: 1985, dtype: float64

and directly on the WLPer column (although this is slightly different due to draws):

In [22]: g.WLPer.rank(pct=True, ascending=False)
Out[22]:
19    1.00
0     0.90
17    0.75
4     0.75
13    0.10
3     0.60
16    0.20
22    0.50
20    0.35
21    0.35
Name: 1985, dtype: float64

Note: I've changed the numbers on the first line, so you'll get different scores on your complete frame.

105

answered Oct 19 '22 17:10

Andy Hayden

You need to calculate rank within the group before normalizing within the group. The other answers will result in percentiles over 100%. I suggest:

df['percentile'] = df.groupby('year')['LgRnk'].rank(pct=True)

answered Oct 19 '22 19:10

user636224

Related questions
                            
                                IPython.parallel not using multicore?
                            
                                What's the difference between scipy.ndimage.filters.convolve and scipy.signal.convolve?
                            
                                Write a local string to a remote file using python paramiko
                            
                                How to specify more than one option in pytest config [pytest_addoption]
                            
                                Python - Generator case where nothing to return
                            
                                How to sort a list of time values?
                            
                                Array of buttons in python
                            
                                In Python, variables inside if conditions hide the global scope even if they are not executed?
                            
                                Why does the child class does not inherit the method from parent class in python in this example?
                            
                                Using rolling_apply on a DataFrame object
                            
                                Repositories of conda recipes and packages
                            
                                How to check if __str__ is implemented by an object
                            
                                Python program with thread can't catch CTRL+C
                            
                                Function is_prime - Error
                            
                                Choose random seed and save it
                            
                                Python custom function using rolling_apply for pandas
                            
                                Finding the position of an object in an image
                            
                                Pyramid stream response body
                            
                                How to create raw string from string variable in python?
                            
                                OpenCV feature matching for multiple images

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

percentile rank in pandas in groups

Tags:

python

pandas

numpy

statistics

scipy

itjcms18

People also ask

2 Answers

Andy Hayden

user636224

Recent Activity

Donate For Us