Example dataframe:
rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['group1', 'group2', 'group3'] * 2,
'B': rand.rand(6),
'C': rand.rand(6),
'D': rand.rand(6)})
print df
A B C D
0 group1 0.417022 0.186260 0.204452
1 group2 0.720324 0.345561 0.878117
2 group3 0.000114 0.396767 0.027388
3 group1 0.302333 0.538817 0.670468
4 group2 0.146756 0.419195 0.417305
5 group3 0.092339 0.685220 0.558690
Groupby column A
group = df.groupby('A')
Use agg to return max value for each group
max1 = group['B'].agg({'max' : np.max})
print max1
max
A
group1 0.417022
group2 0.720324
group3 0.092339
But I would like to retain (or get back) the appropriate data in the other columns, C and D. This would be the remaining data for the row which contained the max value. So, the return should be:
A B C D
group1 0.417022 0.186260 0.204452
group2 0.720324 0.345561 0.878117
group3 0.092339 0.685220 0.558690
Can anybody show how to do this? Any help appreciated.
Groupby preserves the order of rows within each group.
Grouping by a DataFrame and keeping columns involves four steps: get the data, split it, apply a function, and combine the result. It would be best to learn data grouping in Pandas before seeing practical examples.
To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.
Two stages: first find indices, then lookup all the rows.
idx = df.groupby('A').apply(lambda x: x['B'].argmax())
idx
Out[362]:
A
group1 0
group2 1
group3 5
df.loc[idx]
Out[364]:
A B C D
0 group1 0.417022 0.186260 0.204452
1 group2 0.720324 0.345561 0.878117
5 group3 0.092339 0.685220 0.558690
My answer is similar to FooBar but is done in one line by using idmax()
df.loc[df.groupby('A')['B'].idxmax()]
Result is the same:
In [51]: df
Out[51]:
A B C D
0 group1 0.417022 0.186260 0.204452
1 group2 0.720324 0.345561 0.878117
2 group3 0.000114 0.396767 0.027388
3 group1 0.302333 0.538817 0.670468
4 group2 0.146756 0.419195 0.417305
5 group3 0.092339 0.685220 0.558690
In [76]: df.loc[df.groupby('A')['B'].idxmax()]
Out[76]:
A B C D
0 group1 0.417022 0.186260 0.204452
1 group2 0.720324 0.345561 0.878117
5 group3 0.092339 0.685220 0.558690
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With