Pandas groupby aggregate to new columns

Question

I have a DataFrame that looks something like this:

A    B    C   D
1    10   22  14
1    12   20  37
1    11   8   18
1    10   10  6
2    11   13  4
2    12   10  12
3    14   0   5

and a function that looks something like this (NOTE: it's actually doing something more complex that can't be easily separated into three independent calls, but I'm simplifying for clarity):

def myfunc(g):
    return min(g), mean(g), max(g)

I want to use groupby on A with myfunc to get an output on columns B and C (ignoring D) something like this:

                B               C
   min  mean  max  min  mean  max
A
1  10  10.75  12     8  15.0   22
2  11  11.50  12    10  11.5   13
3  14  14.00  14     0   0.0    0

I can do the following:

df2.groupby('A')[['B','C']].agg(
    {
        'min':  lambda g: myfunc(g)[0],
        'mean': lambda g: myfunc(g)[1],
        'max':  lambda g: myfunc(g)[2]
    })

But then—aside from this being ugly and calling myfunc multiple times—I end up with

  max       mean       min
    B   C      B     C   B   C
A
1  12  22  10.75  15.0  10   8
2  12  13  11.50  11.5  11  10
3  14   0  14.00   0.0  14   0

I can use .swaplevel(axis=1) to swap the column levels, but even then B and C are in multiple duplicated columns, and with the multiple function calls it feels like barking up the wrong tree.

unutbu · Accepted Answer

If you arrange for myfunc to return a DataFrame whose columns are ['A','B','C','D'] and whose rows index are ['min', 'mean', 'max'], then you could use groupby/apply to call the function (once for each group) and concatenate the results as desired:

import numpy as np
import pandas as pd

def myfunc(g):
    result = pd.DataFrame({'min':np.min(g),
                           'mean':np.mean(g),
                           'max':np.max(g)}).T
    return result

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3],
 'B': [10, 12, 11, 10, 11, 12, 14],
 'C': [22, 20, 8, 10, 13, 10, 0],
 'D': [14, 37, 18, 6, 4, 12, 5]})

result = df.groupby('A')[['B','C']].apply(myfunc)
result = result.unstack(level=-1)
print(result)

prints

      B                  C            
    max   mean   min   max  mean   min
A                                     
1  12.0  10.75  10.0  22.0  15.0   8.0
2  12.0  11.50  11.0  13.0  11.5  10.0
3  14.0  14.00  14.0   0.0   0.0   0.0

For others who may run across this and who do not need a custom function, note that it behooves you to always use builtin aggregators (below, specified by the strings 'min', 'mean' and 'max') if possible. They perform better than custom Python functions. Happily, in this toy problem, it produces the desired result:

In [99]: df.groupby('A')[['B','C']].agg(['min','mean','max'])
Out[99]: 
    B              C          
  min   mean max min  mean max
A                             
1  10  10.75  12   8  15.0  22
2  11  11.50  12  10  11.5  13
3  14  14.00  14   0   0.0   0

Pandas groupby aggregate to new columns

Tags:

python

pandas

multi-index

Dan

1 Answers

unutbu

Recent Activity

Donate For Us

Pandas groupby aggregate to new columns

Tags:

python

pandas

multi-index

Dan

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us