I am trying to filter a dataframe which has 3 columns and what I'm trying to do is: group by col1 and col2 and get the max value of col3 and also get second max value of col3 but insert it as a new column: col 4
I was able to group it using the below but don't know how to get the second max and insert it as another column:
grouped = df.groupby(['COL1', 'COL2']).agg({'COL3': 'max'})
COL1 COL2 COL3
0 A 1 0.2
1 A 1 0.4
3 B 4 0.7
Wanted output:
COL1 COL2 COL3 COL4
0 A 1 0.4 0.2
3 B 4 0.7 0.7
You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.
You can use .nlargest
. The following solution takes advantage of the fact that the Series
constructor will broadcast values to match the shape of the index.
df.groupby(['COL1', 'COL2'])['COL3'].apply(
lambda s: pd.Series(s.nlargest(2).values, index=['COL3', 'COL4'])
).unstack()
returns
COL3 COL4
COL1 COL2
A 1 0.4 0.2
B 4 0.7 0.7
First sort_values
for aggregate head
for first and second max value and then select by iat
for avoid error if only group with one value:
grouped = (df.sort_values(['COL1','COL2','COL3'], ascending=[True, True, False])
.groupby(['COL1', 'COL2'])['COL3']
.agg(['max', lambda x: x.head(2).iat[-1]])
)
grouped.columns = ['COL3','COL4']
grouped = grouped.reset_index()
print (grouped)
COL1 COL2 COL3 COL4
0 A 1 0.4 0.2
1 B 4 0.7 0.7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With