So I have a dataframe, df1, that looks like the following:
A B C 1 foo 12 California 2 foo 22 California 3 bar 8 Rhode Island 4 bar 32 Rhode Island 5 baz 15 Ohio 6 baz 26 Ohio
I want to group by column A and then sum column B while keeping the value in column C. Something like this:
A B C 1 foo 34 California 2 bar 40 Rhode Island 3 baz 41 Ohio
The issue is, when I say df.groupby('A').sum() column C gets removed returning
B A bar 40 baz 41 foo 34
How can I get around this and keep column C when I group and sum?
Sum of a single column You can use the pandas series sum() function to get the sum of values in individual columns (which essentially are pandas series).
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .
The only way to do this would be to include C in your groupby (the groupby function can accept a list).
Give this a try:
df.groupby(['A','C'])['B'].sum()
One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False
option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:
df.groupby(['A','C'], as_index=False)['B'].sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With