I'm struggling to figure out how to combine two different syntaxes for pandas' dataframe.agg() function.  Take this simple data frame:
df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
                   'B': [10, 12, 10, 25, 10, 12],
                   'C': [100, 102, 100, 250, 100, 102]})
>>> df
[output]
        A   B    C
0  group1  10  100
1  group1  12  102
2  group2  10  100
3  group2  25  250
4  group3  10  100
5  group3  12  102
I know you can send two functions to agg() and get a new data frame where each function is applied to each column:
df.groupby('A').agg([np.mean, np.std])
[output]
           B                C            
        mean        std  mean         std
A                                        
group1  11.0   1.414214   101    1.414214
group2  17.5  10.606602   175  106.066017
group3  11.0   1.414214   101    1.414214
And I know you can pass arguments to a single function:
df.groupby('A').agg(np.std, ddof=0)
[output]
          B   C
A              
group1  1.0   1
group2  7.5  75
group3  1.0   1
But is there a way to pass multiple functions along with arguments for one or both of them?  I was hoping to find something like df.groupby('A').agg([np.mean, (np.std, ddof=0)]) in the docs, but so far no luck.  Any ideas?
Well, the docs on aggregate are in fact a bit lacking. There might be a way to handle this with the correct passing of arguments, and you could look into the source code of pandas for that (perhaps I will later).
However, you could easily do:
df.groupby('A').agg([np.mean, lambda x: np.std(x, ddof=0)])
And it would work just as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With