Pandas aggregate -- how to retain all columns

Tags:

Example dataframe:

rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['group1', 'group2', 'group3'] * 2,
                'B': rand.rand(6),
                'C': rand.rand(6),
                'D': rand.rand(6)})

print df

        A         B         C         D
0  group1  0.417022  0.186260  0.204452
1  group2  0.720324  0.345561  0.878117
2  group3  0.000114  0.396767  0.027388
3  group1  0.302333  0.538817  0.670468
4  group2  0.146756  0.419195  0.417305
5  group3  0.092339  0.685220  0.558690

Groupby column A

group = df.groupby('A')

Use agg to return max value for each group

max1 = group['B'].agg({'max' : np.max})
print max1

             max
A               
group1  0.417022
group2  0.720324
group3  0.092339

But I would like to retain (or get back) the appropriate data in the other columns, C and D. This would be the remaining data for the row which contained the max value. So, the return should be:

     A         B         C         D
group1  0.417022  0.186260  0.204452
group2  0.720324  0.345561  0.878117
group3  0.092339  0.685220  0.558690

Can anybody show how to do this? Any help appreciated.

256

asked Aug 19 '14 13:08

rdh9

2 Answers

Two stages: first find indices, then lookup all the rows.

idx = df.groupby('A').apply(lambda x: x['B'].argmax())
idx

Out[362]: 
A
group1    0
group2    1
group3    5

df.loc[idx]

Out[364]: 
        A         B         C         D
0  group1  0.417022  0.186260  0.204452
1  group2  0.720324  0.345561  0.878117
5  group3  0.092339  0.685220  0.558690

107

answered Sep 29 '22 07:09

FooBar

My answer is similar to FooBar but is done in one line by using idmax()

df.loc[df.groupby('A')['B'].idxmax()]

Result is the same:

In [51]: df
Out[51]: 
        A         B         C         D
0  group1  0.417022  0.186260  0.204452
1  group2  0.720324  0.345561  0.878117
2  group3  0.000114  0.396767  0.027388
3  group1  0.302333  0.538817  0.670468
4  group2  0.146756  0.419195  0.417305
5  group3  0.092339  0.685220  0.558690

In [76]: df.loc[df.groupby('A')['B'].idxmax()]
Out[76]: 
        A         B         C         D
0  group1  0.417022  0.186260  0.204452
1  group2  0.720324  0.345561  0.878117
5  group3  0.092339  0.685220  0.558690

answered Sep 29 '22 06:09

phi-j

Related questions
                            
                                Python - Sum 4D Array
                            
                                Integer prepresentation for UUID4 in Golang
                            
                                Ansi to UTF-8 using python causing error
                            
                                find mean bin values using histogram2d python [duplicate]
                            
                                argparse with multiple optional flags in one dash
                            
                                how redirect output to the file in subprocess.Popen
                            
                                sniff traffic on a particular port using scapy
                            
                                Python CSV writer, how to handle quotes in order to avoid triple quotes in output
                            
                                Convert a symbol to its 4 digit unicode escape representation and vice versa
                            
                                Collectstatic configuration error when deploying Django app with dokku
                            
                                Modify a XML using ElementTree
                            
                                Why sys.path doesn't contain cwd()?
                            
                                How to install python on Mac with wide-build
                            
                                Why os.path.dirname(__file__) is working in Django? [duplicate]
                            
                                Tkinter: ProgressBar with indeterminate duration
                            
                                Anaconda running python: cannot run mkl without a license
                            
                                AttributeError:'module' object has no attribute 'call' :Python
                            
                                Parse HTML with Beautiful Soup. Return text from specific tag
                            
                                Python: Inline if statement else do nothing
                            
                                llvmpy on Ubuntu 14.04

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas aggregate -- how to retain all columns

Tags:

python

pandas

aggregate

rdh9

People also ask

2 Answers

FooBar

phi-j

Recent Activity

Donate For Us