Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: how to group by multiple columns and perform different aggregations on multiple columns?

Lets say I have a table that look like this:

Company      Region     Date           Count         Amount
AAA          XXY        3-4-2018       766           8000
AAA          XXY        3-14-2018      766           8600
AAA          XXY        3-24-2018      766           2030
BBB          XYY        2-4-2018        66           3400
BBB          XYY        3-18-2018       66           8370
BBB          XYY        4-6-2018        66           1380

I want to get rid of the Date column, then aggregate by Company AND region to find the average of Count and sum of Amount.

Expected output:

Company      Region     Count         Amount
AAA          XXY        766           18630
BBB          XYY        66            13150

I looked into this post here, and many other posts online, but seems like they are only performing one kind of aggregation action (for example, I can aggregate by multiple columns but can only produce one column output as sum OR count, NOT sum AND count)

Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

Can someone help?

What I did:

I followed this post here:

https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/

however, when i try to use the method presented in this article (toward the end of the article), by using dictionary:

aggregation = {
    'Count': {
        'Total Count': 'mean'
    },
    'Amount': {
        'Total Amount': 'sum'
    }
}

I would get this warning:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
  return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

I know it works now but i want to make sure my script works later too. How can I update my code to be compatible in the future?

like image 962
alwaysaskingquestions Avatar asked May 28 '18 15:05

alwaysaskingquestions


2 Answers

Need aggregate by single non nested dictionary and then rename columns:

aggregation = {'Count':  'mean', 'Amount': 'sum'}
cols_d = {'Count': 'Total Count', 'Amount': 'Total Amount'}

df = df.groupby(['Company','Region'], as_index=False).agg(aggregation).rename(columns=cols_d)
print (df)
  Company Region  Total Count  Total Amount
0     AAA    XXY          766         18630
1     BBB    XYY           66         13150

Another solution with add_prefix instead rename:

aggregation = {'Count':  'mean', 'Amount': 'sum'}
df = df.groupby(['Company','Region']).agg(aggregation).add_prefix('Total ').reset_index()
print (df)
  Company Region  Total Count  Total Amount
0     AAA    XXY          766         18630
1     BBB    XYY           66         13150
like image 157
jezrael Avatar answered Oct 23 '22 10:10

jezrael


df.groupby(['Region', 'Company']).agg({'Count': 'mean', 'Amount': 'sum'}).reset_index()

outputs:

  Region Company  Count  Amount
0    XXY     AAA    766   18630
1    XYY     BBB     66   13150
like image 1
Haleemur Ali Avatar answered Oct 23 '22 09:10

Haleemur Ali