I am trying to define an aggregation function with more than one OUTPUT columns, which i would like to use as follows
df.groupby(by=...).agg(my_aggregation_function_with_multiple_columns)
any idea how to do it ?
i tried things like
def my_aggregation_function_with_multiple_columns(slice_values):
return {'col_1': -1,'col_2': 1}
but this will logically output the dictionary {'col_1': -1,'col_2': 1} in a single column...
It is not possible, because agg
working with all columns separately - first process first column, then second.... to the end.
Solution is flexible apply
and for return multiple output add Series
if output is more scalars.
def my_aggregation_function_with_multiple_columns(slice_values):
return pd.Series([-1, 1], index=['col_1','col_2'])
df.groupby(by=...).apply(my_aggregation_function_with_multiple_columns)
Sample:
df = pd.DataFrame(dict(A=[1,1,2,2,3], B=[4,5,6,7,2], C=[1,2,4,6,9]))
print (df)
def my_aggregation_function_with_multiple_columns(slice_values):
#print each group
#print (slice_values)
a = slice_values['B'] + slice_values['C'].shift()
print (type(a))
return a
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
df = df.groupby('A').apply(my_aggregation_function_with_multiple_columns)
print (df)
A
1 0 NaN
1 6.0
2 2 NaN
3 11.0
3 4 NaN
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With