Consider this dataframe
df = pd.DataFrame({'a': [1,2,1,3,4,2], 'c':['dd','ee','dd','as','ae','ee'], 'count':[5,9,1,6,8,3]})
a c count
0 1 dd 5
1 2 ee 9
2 1 dd 1
3 3 as 6
4 4 ae 8
5 2 ee 3
As you can see there are duplicates in column 'a' 1 and 2
are repeated multiple times.
i want to sum the count of such in pandas like in sql we do groupby.
my final df should look like this
a c count
0 1 dd 6
1 2 ee 12
2 3 as 6
3 4 ae 8
i tried by using
df = df.groupby('a')
but it is returning me
<pandas.core.groupby.DataFrameGroupBy object
You need groupby
by columns a
and c
with aggregating sum
:
df = df.groupby(['a','c'], as_index=False)['count'].sum()
print (df)
a c count
0 1 dd 6
1 2 ee 12
2 3 as 6
3 4 ae 8
But if need groupby only column a
, then is necessary aggregate
all columns which need in output - e.g. column c
is aggregate by first
and count
by sum
:
df = df.groupby('a').agg({'c':'first', 'count':'sum'}).reset_index()
print (df)
a c count
0 1 dd 6
1 2 ee 12
2 3 as 6
3 4 ae 8
You almost had it
df.groupby(['a', 'c']).sum().reset_index()
yields
a c count
0 1 dd 6
1 2 ee 12
2 3 as 6
3 4 ae 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With