I am looking to do some aggregation on a pandas groupby dataframe, where I need to apply several different custom functions on multiple columns. This operation is very easy and customary in R (using data.table
or dplyr
), but I am surprised I'm finding it so difficult in pandas:
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,4,5,6],'B':[2,4,6,8,10,12],'C':[1,1,1,2,2,2]})
#These work
data.groupby('C').apply(lambda x: x.A.mean() - x.B.mean())
data.groupby('C').agg(['mean','std'])
#but this doesn't
data.groupby('C').agg([lambda x: x.A.mean() - x.B.mean(),
lambda x: len(x.A)])
I want to calculate a statistic but also the sample size in each group, which seems like it should be a one or two line solution, but I also sometimes need to apply multiple functions on multiple columns of the grouped data frame.
If you need a one-liner, you can do this:
#use apply instead of agg to create multiple columns
data.groupby('C').apply(lambda x: pd.Series([x.A.mean() - x.B.mean(), len(x.A)])).rename(columns={0:'diff',1:'a_len'})
Out[2346]:
diff a_len
C
1 -2.0 3.0
2 -5.0 3.0
Another solution without using rename.
data.groupby('C').apply(lambda x: pd.DataFrame([[x.A.mean() - x.B.mean(), len(x.A)]],columns=['diff','a_len']))
Out[24]:
diff a_len
C
1 0 -2.0 3
2 0 -5.0 3
We can write a function that does the custom functions on multiple columns and returns the result as a data frame.
>>> def meandiff_length(data):
data['mean_diff'] = data.A.mean() - data.B.mean()
data['a_length'] = len(data.A)
return data
We can group the data and apply the custom function to the groups separately.
>>> data.groupby('C').apply(meandiff_length)
A B C mean_diff a_length
0 1 2 1 -2.0 3
1 2 4 1 -2.0 3
2 3 6 1 -2.0 3
3 4 8 2 -5.0 3
4 5 10 2 -5.0 3
5 6 12 2 -5.0 3
This specific custom function returns the same value in every row, so it may be of your interest to use drop_duplicates
. However, this is a general solution that will also work when our custom function becomes more complex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With