Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a group in a way that I get the largest number in the first row and smallest in the second and the second largest in the third and so on

So I have a df like this

In [1]:data= {'Group': ['A','A','A','A','A','A','B','B','B','B'],
    'Name': [ ' Sheldon Webb',' Traci Dean',' Chad Webster',' Ora Harmon',' Elijah Mendoza',' June Strickland',' Beth Vasquez',' Betty Sutton',' Joel Gill',' Vernon Stone'],
    'Performance':[33,64,142,116,122,68,95,127,132,80]}
In [2]:df = pd.DataFrame(data, columns = ['Group', 'Name','Performance'])

Out[1]:
    Group  Name             Performance
0    A     Sheldon Webb       33
1    A     Traci Dean         64
2    A     Chad Webster      142
3    A     Ora Harmon        116
4    A     Elijah Mendoza    122
5    A     June Strickland    68
6    B     Beth Vasquez       95
7    B     Betty Sutton      127
8    B     Joel Gill         132
9    B     Vernon Stone       80

I want to sort it in such an alternating way that within a group, say group "A", the first row should have its highest performing person (in this case "Chad Webster") and then in the second row the least performing (which is "Sheldon Webb").

The output I am looking for would look something like this:

Out[2]:
    Group   Name             Performance
0    A     Chad Webster      142
1    A     Sheldon Webb       33
2    A     Elijah Mendoza    122
3    A     Traci Dean         64
4    A     Ora Harmon        116
5    A     June Strickland    68
6    B     Joel Gill         132
7    B     Vernon Stone       80
8    B     Betty Sutton      127
9    B     Beth Vasquez       95

You can see the sequence is alternating between the highest and lowest within a group.

like image 478
Gamerarms Avatar asked Jul 31 '20 14:07

Gamerarms


1 Answers

Take the sorted order and then apply a quadratic function to it where the root is 1/2 the length of the array (plus some small offset). This way the highest rank is given to the extremal values (the sign of the eps offset determines whether you want a the highest value ranked above the lowest value). I added a small group at the end to show how it properly handles repeated values or an odd group size.

def extremal_rank(s):
    eps = 10**-4
    y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index) 
         - (len(s)+1)/2 + eps)**2
    return y.reindex_like(s)
    
df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
df = df.sort_values(['Group', 'rnk'], ascending=[True, False])

   Group              Name  Performance     rnk
2      A      Chad Webster          142  6.2505
0      A      Sheldon Webb           33  6.2495
4      A    Elijah Mendoza          122  2.2503
1      A        Traci Dean           64  2.2497
3      A        Ora Harmon          116  0.2501
5      A   June Strickland           68  0.2499
8      B         Joel Gill          132  2.2503
9      B      Vernon Stone           80  2.2497
7      B      Betty Sutton          127  0.2501
6      B      Beth Vasquez           95  0.2499
11     C                 b          110  9.0006
12     C                 c           68  8.9994
10     C                 a          110  4.0004
13     C                 d           68  3.9996
15     C                 f           70  1.0002
16     C                 g           70  0.9998
14     C                 e           70  0.0000
like image 75
ALollz Avatar answered Oct 14 '22 19:10

ALollz