Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by one columns and find sum and max value for another in pandas

I have a dataframe like this:

Name  id  col1  col2  col3  cl4 
PL    252  0     747   3     53  
PL2   252  1     24    2     35 
PL3   252  4     75    24    13 
AD    889  53    24    0     95 
AD2   889  23    2     0     13  
AD3   889  0     24    3     6  
BG    024  12    89    53    66 
BG1   024  43    16    13    0   
BG2   024  5     32    101   4   

And now I need to group by ID, and for columns col1 and col4 find the sum for each id and put that into a new column near to parent column (example: col3(sum)) But for col2 and col3 find max value. Desired output:

Name  id  col1 col1(sum) col2 col2(max) col3 col(max) col4 col4(sum)
PL    252  0       5      747    747     3     24    6    18
PL2   252  1       5      24     747     2     24    12   18
PL3   252  4       5      75     747     24    24    0    18
AD    889  53      76     24     24      95    95    23   33
AD2   889  23      76     2      24      13    95    5    33
AD3   889  0       76     24     24      6     95    5    33
BG    024  12      60     89     89      66    66    0    67   
BG1   024  43      60     16     89      0     66    63   67    
BG2   024  5       60     32     89      4     66    4    67    

What is the easiest and fastest way to calculate this?

like image 992
jovicbg Avatar asked Jun 23 '17 14:06

jovicbg


People also ask

How do I sum two columns in groupby pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do you find the max value in multiple columns in Python?

Get the maximum values of every column in Python To find the maximum value of each column, call the max() method on the Dataframe object without taking any argument. In the output, We can see that it returned a series of maximum values where the index is the column name and values are the maxima from each column.


2 Answers

The most (pandas) native way to do this, is to use the .agg() method that allows you to specify the aggregation function you want to apply per column (just like you would do in SQL).

Sample from the documentation:

df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})
like image 138
Maresh Avatar answered Oct 16 '22 19:10

Maresh


You can use merge when you have groupby and sum on id :

pd.merge(df,df.groupby("id").sum().reset_index(), on='id',how='outer')

output

enter image description here

like image 32
Tbaki Avatar answered Oct 16 '22 18:10

Tbaki