Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using Pandas .groupby, why use .agg versus directly using the function eg .sum()

In Python, to obtain summaries by group, I use groupby().agg(fx()); eg groupby('variable').agg('sum'). What is the difference between that and directly using the function, eg; groupby('variable').sum() ?

like image 938
Wael Hussein Avatar asked Oct 20 '25 01:10

Wael Hussein


1 Answers

Setup

df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

The primary benefit of using agg is stated in the docs:

Aggregate using one or more operations over the specified axis.

If you have separate operations that need to be applied to each individual column, agg takes a dictionary (or a function, string, or list of strings/functions) that allows you to create that mapping in a single statement. So if you'd like the sum of column a, and the mean of column b:

df.agg({'a': 'sum', 'b': 'mean'})

a    6.0
b    5.0
dtype: float64

It also allows you to apply multiple operations to a single column in a single statement. For example, to find the sum, mean, and std of column a:

df.agg({'a': ['sum', 'mean', 'std']})

        a
sum   6.0
mean  2.0
std   1.0

There's no difference in outcome when you use agg with a single operation. I'd argue that df.agg('sum') is less clear than df.sum(), but the results will be the same:

df.agg('sum')

a     6
b    15
dtype: int64

df.sum()

a     6
b    15
dtype: int64

The main benefit agg provides is the convenience of applying multiple operations.

like image 189
user3483203 Avatar answered Oct 21 '25 17:10

user3483203



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!