Beginner question. This seems like it should be a straightforward operation, but I can't figure it out from reading the docs.
I have a df with this structure:
|integer_id|int_field_1|int_field_2|
The integer_id column is non-unique, so I'd like to group the df by integer_id and sum the two fields.
The equivalent SQL is:
SELECT integer_id, SUM(int_field_1), SUM(int_field_2) FROM tbl
GROUP BY integer_id
Any suggestions on the simplest way to do this?
EDIT: Including input/output
Input:
integer_id int_field_1 int_field_2
2656 36 36
2656 36 36
9702 2 2
9702 1 1
Ouput using df.groupby('integer_id').sum():
integer_id int_field_1 int_field_2
2656 72 72
9702 3 3
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
To sum pandas DataFrame columns (given selected multiple columns) using either sum() , iloc[] , eval() and loc[] functions. Among these pandas DataFrame. sum() function returns the sum of the values for the requested axis, In order to calculate the sum of columns use axis=1 .
You just need to call sum
on a groupby
object:
df.groupby('integer_id').sum()
See the docs for further examples
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With