Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: Groupby Sum AND Concatenate Strings

Tags:

python

pandas

Sample Pandas Dataframe:

ID Name COMMENT1 COMMENT2 NUM
1  dan  hi       hello    1
1  dan  you      friend   2
3  jon  yeah     nope     3
2  jon  dog      cat      .5
3  jon  yes      no       .1

I am trying to create a dataframe that groups by ID and NAME that concatenates COMMENT1 and COMMENT2 that also sums NUM.

This is what I'm looking for:

ID Name COMMENT1     COMMENT2        NUM
1  dan  hi you       hello friend    3
3  jon  yeah yes     nope no         3.1
2  jon  dog          cat             .5

I tried using this:

input_df = input_df.groupby(['ID', 'NAME', 'COMMENT1', 'COMMENT2']).sum().reset_index()

But it doesn't work.

If I use this:

input_df = input_df.groupby(['ID']).sum().reset_index()

It sums the NUM column but leaves out all other columns.

like image 524
kevintrankt Avatar asked Dec 01 '17 20:12

kevintrankt


People also ask

How do you get the sum of Groupby in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do you use Groupby and aggregate?

The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.


1 Answers

Let us make it into one line

df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]: 
   ID Name  COMMENT1      COMMENT2  NUM
0   1  dan    hi you  hello friend  3.0
1   2  jon       dog           cat  0.5
2   3  jon  yeah yes       nope no  3.1
like image 130
BENY Avatar answered Sep 30 '22 18:09

BENY