Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby() with custom aggregate function and put result in a new column

Suppose I have a dataframe with 3 columns. I want to group it by one of the columns and compute a new value for each group using a custom aggregate function.

This new value has a totally different meaning and its column just is not present in the original dataframe. So, in effect, I want to change the shape of the dataframe during the groupby() + agg() transformation. The original dataframe looks like (foo, bar, baz) and has a range index while the resulting dataframe needs to have only (qux) column and baz as an index.

import pandas as pd

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['a', 'b', 'c'], 'baz': [0, 0, 1]})
df.head()

#        foo    bar    baz
#   0      1      a      0
#   1      2      b      0
#   2      3      c      1    

def calc_qux(gdf, **kw):
    qux = ','.join(map(str, gdf['foo'])) + ''.join(gdf['bar'])
    return (None, None)  # but I want (None, None, qux)

df = df.groupby('baz').agg(calc_qux, axis=1)  # ['qux'] but then it fails, since 'qux' is not presented in the frame.
df.head()

#      qux
# baz       
#   0  1,2ab
#   1  3c

The code above produces an error ValueError: Shape of passed values is (2, 3), indices imply (2, 2) if I'm trying to return from the aggregation function different amount of values than the number of columns in the original dataframe.

like image 750
Ivan Velichko Avatar asked Nov 08 '18 15:11

Ivan Velichko


1 Answers

You want to use apply() here since you are not operating on a single column (in which case agg() would be appropriate):

import pandas as pd

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['a', 'b', 'c'], 'baz': [0, 0, 1]})

def calc_qux(x):

    return ','.join(x['foo'].astype(str).values) + ''.join(x['bar'].values)

df.groupby('baz').apply(calc_qux).to_frame('qux')

Yields:

       qux
baz       
0    1,2ab
1       3c
like image 144
rahlf23 Avatar answered Sep 30 '22 17:09

rahlf23