Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas aggregate function with multiple output columns

I am trying to define an aggregation function with more than one OUTPUT columns, which i would like to use as follows

df.groupby(by=...).agg(my_aggregation_function_with_multiple_columns)

any idea how to do it ?

i tried things like

def my_aggregation_function_with_multiple_columns(slice_values):
    return {'col_1': -1,'col_2': 1}

but this will logically output the dictionary {'col_1': -1,'col_2': 1} in a single column...

like image 326
register Avatar asked Aug 29 '17 12:08

register


1 Answers

It is not possible, because agg working with all columns separately - first process first column, then second.... to the end.

Solution is flexible apply and for return multiple output add Series if output is more scalars.

def my_aggregation_function_with_multiple_columns(slice_values):
    return pd.Series([-1, 1], index=['col_1','col_2'])

df.groupby(by=...).apply(my_aggregation_function_with_multiple_columns)

Sample:

df = pd.DataFrame(dict(A=[1,1,2,2,3], B=[4,5,6,7,2], C=[1,2,4,6,9]))
print (df)

def my_aggregation_function_with_multiple_columns(slice_values):
    #print each group
    #print (slice_values)
    a = slice_values['B'] + slice_values['C'].shift()
    print (type(a))
    return a

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

df = df.groupby('A').apply(my_aggregation_function_with_multiple_columns)
print (df)
A   
1  0     NaN
   1     6.0
2  2     NaN
   3    11.0
3  4     NaN
dtype: float64
like image 125
jezrael Avatar answered Oct 14 '22 23:10

jezrael