Sample Pandas Dataframe:
ID Name COMMENT1 COMMENT2 NUM 1 dan hi hello 1 1 dan you friend 2 3 jon yeah nope 3 2 jon dog cat .5 3 jon yes no .1
I am trying to create a dataframe that groups by ID and NAME that concatenates COMMENT1 and COMMENT2 that also sums NUM.
This is what I'm looking for:
ID Name COMMENT1 COMMENT2 NUM 1 dan hi you hello friend 3 3 jon yeah yes nope no 3.1 2 jon dog cat .5
I tried using this:
input_df = input_df.groupby(['ID', 'NAME', 'COMMENT1', 'COMMENT2']).sum().reset_index()
But it doesn't work.
If I use this:
input_df = input_df.groupby(['ID']).sum().reset_index()
It sums the NUM column but leaves out all other columns.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.
Let us make it into one line
df.groupby(['ID','Name'],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
Out[1510]:
ID Name COMMENT1 COMMENT2 NUM
0 1 dan hi you hello friend 3.0
1 2 jon dog cat 0.5
2 3 jon yeah yes nope no 3.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With