I have a pandas dataframe that looks as follows:
X Y
71455 [334.0, 319.0, 298.0, 323.0]
71455 [3.0, 8.0, 13.0, 10.0]
57674 [54.0, 114.0, 124.0, 103.0]
I want to perform an aggregate groupby
that adds the lists stored in the Y columns element-wise. Code I have tried:
df.groupby('X').agg({'Y' : sum})
The result is the following:
Y
X
71455 [334.0, 319.0, 298.0, 323.0, 75.0, 55.0, ...
So it has concatenated the lists and not sum them element-wise. The expected result however is:
X Y
71455 [337.0, 327.0, 311.0, 333.0]
57674 [54.0, 114.0, 124.0, 103.0]
I have tried different methods, but could not get this to work as expected.
It's possible to use apply
on the grouped dataframe to get element-wise addition:
df.groupby('X')['Y'].apply(lambda x: [sum(y) for y in zip(*x)])
Which results in a pandas series object:
X
57674 [54.0, 114.0, 124.0, 103.0]
71455 [337.0, 327.0, 311.0, 333.0]
Pandas isn't designed for use with series of lists. Such an attempt forces Pandas to use object
dtype series which cannot be manipulated in a vectorised fashion. Instead, you can split your series of lists into numeric series before aggregating:
import pandas as pd
df = pd.DataFrame({'X': [71455, 71455, 57674],
'Y': [[334.0, 319.0, 298.0, 323.0],
[3.0, 8.0, 13.0, 10.0],
[54.0, 114.0, 124.0, 103.0]]})
df = df.join(pd.DataFrame(df.pop('Y').values.tolist()))
res = df.groupby('X').sum().reset_index()
print(res)
X 0 1 2 3
0 57674 54.0 114.0 124.0 103.0
1 71455 337.0 327.0 311.0 333.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With