In a pandas DataFrame, is it possible to collapse columns which have identical values, and sum up the values in another column?
Code
data = {"score":{"0":9.397,"1":9.397,"2":9.397995,"3":9.397996,"4":9.3999},"type":{"0":"advanced","1":"advanced","2":"advanced","3":"newbie","4":"expert"},"count":{"0":394.18930604,"1":143.14226729,"2":9.64172783,"3":0.1,"4":19.65413734}}
df = pd.DataFrame(data)
df
Output
count score type
0 394.189306 9.397000 advanced
1 143.142267 9.397000 advanced
2 9.641728 9.397995 advanced
3 0.100000 9.397996 newbie
4 19.654137 9.399900 expert
In the example above, the first two rows have the same score
and type
, so these rows should be merged together and their scores added up.
Desired Output
count score type
0 537.331573 9.397000 advanced
1 9.641728 9.397995 advanced
2 0.100000 9.397996 newbie
3 19.654137 9.399900 expert
Get Total of Column Using Series.sum() gets you the sum of a column. This is equivalent to the method numpy. sum . You can assign the sum of a column to a DataFrame to create a row.
Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
This is a job for groupby
:
>>> df.groupby(["score", "type"]).sum()
count
score type
9.397000 advanced 537.331573
9.397995 advanced 9.641728
9.397996 newbie 0.100000
9.399900 expert 19.6541374
>>> df.groupby(["score", "type"], as_index=False).sum()
score type count
0 9.397000 advanced 537.331573
1 9.397995 advanced 9.641728
2 9.397996 newbie 0.100000
3 9.399900 expert 19.654137
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With