I am trying to filter a dataframe which has 3 columns and what I'm trying to do is: group by col1 and col2 and get the max value of col3 and also get second max value of col3 but insert it as a new column: col 4
I was able to group it using the below but don't know how to get the second max and insert it as another column:
grouped = df.groupby(['COL1', 'COL2']).agg({'COL3': 'max'})
   COL1  COL2  COL3
0   A    1      0.2 
1   A    1      0.4
3   B    4      0.7   
Wanted output:
   COL1  COL2  COL3  COL4
0   A    1      0.4  0.2
3   B    4      0.7  0.7 
                You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.
You can use .nlargest. The following solution takes advantage of the fact that the Series constructor will broadcast values to match the shape of the index.
df.groupby(['COL1', 'COL2'])['COL3'].apply(
    lambda s: pd.Series(s.nlargest(2).values, index=['COL3', 'COL4'])
).unstack()
returns
           COL3  COL4
COL1 COL2            
A    1      0.4   0.2
B    4      0.7   0.7
                        First sort_values for aggregate head for first and second max value and then select by iat for avoid error if only group with one value:
grouped = (df.sort_values(['COL1','COL2','COL3'], ascending=[True, True, False])
             .groupby(['COL1', 'COL2'])['COL3']
             .agg(['max', lambda x: x.head(2).iat[-1]])
          )
grouped.columns = ['COL3','COL4']
grouped = grouped.reset_index()
print (grouped)
  COL1  COL2  COL3  COL4
0    A     1   0.4   0.2
1    B     4   0.7   0.7
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With