Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Keep other columns when using sum() with groupby

Tags:

python

pandas

I have a pandas dataframe below:

    df      name    value1    value2  otherstuff1 otherstuff2  0   Jack       1         1       1.19        2.39      1   Jack       1         2       1.19        2.39 2   Luke       0         1       1.08        1.08   3   Mark       0         1       3.45        3.45 4   Luke       1         0       1.08        1.08 

Same "name" will have the same value for otherstuff1 and otherstuff2.

I'm trying to groupby by column 'name' and sum column 'value1' and sum column 'value2' (Not sum value1 with value2!!! But sum them individually in each column)

Expecting to get result below:

    newdf      name    value1    value2  otherstuff1 otherstuff2  0   Jack       2         3       1.19        2.39      1   Luke       1         1       1.08        1.08   2   Mark       0         1       3.45        3.45 

I've tried

newdf = df.groupby(['name'], as_index = False).sum() 

which groupsby name and sums up both value1 and value2 columns correctly but end up dropping column otherstuff1 and otherstuff2.

Please help. Thank you guys so much!

like image 468
SwagZ Avatar asked Apr 11 '18 19:04

SwagZ


People also ask

How do you group by and sum multiple columns in Pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do I sum only certain columns in Pandas?

To sum given or list of columns then create a list with all columns you wanted and slice the DataFrame with the selected list of columns and use the sum() function. Use df['Sum']=df[col_list]. sum(axis=1) to get the total sum.


2 Answers

You should specify what pandas must do with the other columns. In your case, I think you want to keep one row, regardless of its position within the group.

This could be done with agg on a group. agg accepts a parameter that specifies what operation should be performed for each column.

df.groupby(['name'], as_index=False).agg({'value1': 'sum', 'value2': 'sum', 'otherstuff1': 'first', 'otherstuff2': 'first'}) 
like image 68
Guybrush Avatar answered Sep 22 '22 01:09

Guybrush


Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum() Out[121]:     name  otherstuff1  otherstuff2  value1  value2 0  Jack         1.19         2.39       2       3 1  Luke         1.08         1.08       1       1 2  Mark         3.45         3.45       0       1 
like image 39
BENY Avatar answered Sep 23 '22 01:09

BENY