Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas GroupBy sum concatenates numbers instead of summing them

When I use the following code:

    print(self.df.groupby(by=[2])[3].agg(['sum']))

On the following Dataframe:

            0    1        2        3       4     5     6     7
0          15  LCU   Test 1   308.02  170703  ALCU  4868  MS10
1          16  LCU   Test 2   127.37  170703  ALCU  4868  MS10

The sum function is not completed correctly because the value column (col 3) returns a concatenated string of the values (308.02127.37) instead of maintaining the integrity of the individual values to allow operation.

like image 240
RGETLB Avatar asked Sep 01 '25 04:09

RGETLB


1 Answers

It seems like your 3rd column is a string. Did you load in your dataframe using dtype=str?

Furthermore, try not to hardcode your columns. You can use .astype or pd.to_numeric to cast and then apply sum:

self.df.groupby(self.df.columns[2])[self.df.columns[3]].agg(
    lambda x: pd.to_numeric(x, errors='coerce').sum()
)

Or

self.df.groupby(self.df.columns[2])[self.df.columns[3]].agg(
    lambda x: x.astype(float).sum()
)
like image 137
cs95 Avatar answered Sep 02 '25 16:09

cs95