Cumulative Ranking of Values in Pandas with Ties

Question

I am trying to find a way to do a cumulative total that accounts for ties in Pandas.

Lets take hypothetical data from a track meet, where I have people, races, heats, and time.

Each person's placement is according to the following:

For a given race/heat combination:

The person person with the lowest time placed first
The person with the second lowest time placed second

and so on...

This would be fairly simple code, but for one thing..

If two people have the same time, they both get the same place and then the next time greater than their time will have that value + 1 as the placement.

In the table below, for 100 yard dash, heat 1, RUNNER1 finished first, RUNNER2/RUNNER3 finished second, and RUNNER3 finished third (next time after RUNNER2/RUNNER3)

So basically, the logic is as follows:

If race <> race.shift() or heat <> heat.shift() then place =1

If race = race.shift() and heat = heat.shift() and time>time.shift then place =place.shift()+1

If race = race.shift() and heat = heat.shift() and time>time.shift then place =place.shift()

The part that confuses me is how to handle the ties. Otherwise I could do something like

df['Place']=np.where(
              (df['race']==df['race'].shift())
              &
              (df['heat']==df['heat'].shift()),
              df['Place'].shift()+1,
              1)

Thank you!

Sample data follows:

Person,Race,Heat,Time
RUNNER1,100 Yard Dash,1,9.87
RUNNER2,100 Yard Dash,1,9.92
RUNNER3,100 Yard Dash,1,9.92
RUNNER4,100 Yard Dash,1,9.96
RUNNER5,100 Yard Dash,1,9.97
RUNNER6,100 Yard Dash,1,10.01
RUNNER7,100 Yard Dash,2,9.88
RUNNER8,100 Yard Dash,2,9.93
RUNNER9,100 Yard Dash,2,9.93
RUNNER10,100 Yard Dash,2,10.03
RUNNER11,100 Yard Dash,2,10.26
RUNNER7,200 Yard Dash,1,19.63
RUNNER8,200 Yard Dash,1,19.67
RUNNER9,200 Yard Dash,1,19.72
RUNNER10,200 Yard Dash,1,19.72
RUNNER11,200 Yard Dash,1,19.86
RUNNER12,200 Yard Dash,1,19.92

what I want at the end is

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4

[edit] Now, one step further..

Lets assume that once I leave a set of unique values, the next time that set comes up, the values will reset to 1..

So, for example, - Note that it goes to "heat 1" and then "heat 2" and back to "heat 1" - I don't want the rankings to continue from the prior "heat 1", rather I want them to reset.

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2

unutbu · Accepted Answer

You could use:

grouped =  df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)

import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})

grouped =  df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)

yields

    Heat    Person           Race   Time  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26    4.0   5.0
11     1   RUNNER7  200 Yard Dash  19.63    1.0   1.0
12     1   RUNNER8  200 Yard Dash  19.67    2.0   2.0
13     1   RUNNER9  200 Yard Dash  19.72    3.0   3.0
14     1  RUNNER10  200 Yard Dash  19.72    3.0   3.0
15     1  RUNNER11  200 Yard Dash  19.86    4.0   5.0
16     1  RUNNER12  200 Yard Dash  19.92    5.0   6.0

Note that Pandas has a Groupby.rank method which can compute many common forms of rank -- but not the one you described. Notice how for example on row 3 the Rank is 4 after a tie between the second and third runners, while the Place is 3.

Regarding the edit: Use

(df['Heat'] != df['Heat'].shift()).cumsum()

to disambiguate the heats:

import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})

df['HeatGroup'] = (df['Heat'] != df['Heat'].shift()).cumsum()
grouped =  df.groupby(['Race','HeatGroup'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)

yields

    Heat    Person           Race   Time  HeatGroup  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87          1    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92          1    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92          1    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96          1    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97          1    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01          1    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88          2    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93          2    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93          2    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03          2    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26          2    4.0   5.0
11     1   RUNNER7  100 Yard Dash  19.63          3    1.0   1.0
12     1   RUNNER8  100 Yard Dash  19.67          3    2.0   2.0
13     1   RUNNER9  100 Yard Dash  19.72          3    3.0   3.0
14     1  RUNNER10  100 Yard Dash  19.72          3    3.0   3.0
15     1  RUNNER11  100 Yard Dash  19.86          3    4.0   5.0
16     1  RUNNER12  100 Yard Dash  19.92          3    5.0   6.0

Cumulative Ranking of Values in Pandas with Ties

Tags:

python

pandas

dataframe

cumulative-sum

Stumbling Through Data Science

1 Answers

unutbu

Recent Activity

Donate For Us

Cumulative Ranking of Values in Pandas with Ties

Tags:

python

pandas

dataframe

cumulative-sum

Stumbling Through Data Science

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us