Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the maximum values of a column in multiindex dataframe and return all its values

Reproducible code for the dataset:

df = {'player' : ['a','a','a','a','a','a','a','a','a','b','b','b','b','b','b','b','b','b','c','c','c','c','c','c','c','c','c'],
      'week' : ['1','1','1','2','2','2','3','3','3','1','1','1','2','2','2','3','3','3','1','1','1','2','2','2','3','3','3'],
      'category': ['RES','VIT','MATCH','RES','VIT','MATCH','RES','VIT','MATCH','RES','VIT','MATCH','RES','VIT','MATCH','RES','VIT','MATCH','RES','VIT','MATCH','RES','VIT','MATCH','RES','VIT','MATCH'],
      'energy' : [75,54,87,65,24,82,65,42,35,25,45,87,98,54,82,75,54,87,65,24,82,65,42,35,25,45,98] }

df = pd.DataFrame(data= df)
df = df[['player', 'week', 'category','energy']]

Actual Dataset

I need to find "For each player, Find the week where his energy was maximum and display all the categories, energy values for that week"

So what I did was:

1.Set Player and Week as Index

2.Iterate over the index to find the max value of energy and return its value

df = df.set_index(['player', 'week'])

for index, row in df1.iterrows():
    group = df1.ix[df1['energy'].idxmax()]

Output Obtained:

                category energy
  player   week     
    b        2    RES      98
             2    VIT      54
             2   MATCH     82

This obtained output is for the maximum energy in the entire dataset, I would want the maximum for each player with the all other categories and its energy for that week.

Expected Output:

Expected Output

I have tried using groupby method as suggested in the comments,

df.groupby(['player','week'])['energy'].max().groupby(level=['player','week'])

The output obtained is:

                energy  category
 player week        
   a     1        87    VIT
         2        82    VIT
         3        65    VIT
   b     1        87    VIT
         2        98    VIT
         3        87    VIT
   c     1        82    VIT
         2        65    VIT
         3        98    VIT
like image 451
vishnu prashanth Avatar asked Apr 18 '18 18:04

vishnu prashanth


2 Answers

Find the max energy week for each player, then select that week for the player and concatenate result across all players.

max_energy_idx = df.groupby('player')['energy'].idxmax()  # 2, 12, 26
max_energy_weeks = df['week'].iloc[max_energy_idx]  # '1', '2', '3'
players = sorted(df['player'].unique())  # 'a', 'b', 'c'

result = pd.concat(
    [df.loc[(df['player'] == player) & (df['week'] == max_enery_week), :] 
     for player, max_enery_week in zip(players, max_energy_weeks)]
)
>>> result
   player week category  energy
0       a    1      RES      75
1       a    1      VIT      54
2       a    1    MATCH      87
12      b    2      RES      98
13      b    2      VIT      54
14      b    2    MATCH      82
24      c    3      RES      25
25      c    3      VIT      45
26      c    3    MATCH      98

If desired, you can set the index on the result:

result = result.set_index(['player', 'week'])
like image 118
Alexander Avatar answered Oct 26 '22 23:10

Alexander


Taking your df with its original index (i.e. before setting the multiindex), you can get to your result in one line by performing an inner join with .merge:

df.merge(df.loc[df.groupby('player').energy.idxmax(), ['player', 'week']])

#   player week category  energy
# 0      a    1      RES      75
# 1      a    1      VIT      54
# 2      a    1    MATCH      87
# 3      b    2      RES      98
# 4      b    2      VIT      54
# 5      b    2    MATCH      82
# 6      c    3      RES      25
# 7      c    3      VIT      45
# 8      c    3    MATCH      98
like image 39
cmaher Avatar answered Oct 26 '22 23:10

cmaher