So, I'm learning pandas and I have this problem.
Suppose I have a Dataframe like this:
A B C
1 x NaN
2 y NaN
3 x NaN
4 x NaN
5 y NaN
I'm trying to create this:
A B C
1 x [1,3,4]
2 y [2,5]
3 x [1,3,4]
4 x [1,3,4]
5 y [2,5]
Based on B similarities.
I did this:
teste = df.groupby(['B'])
for name,group in teste:
df.loc[df['B'] == name[0],'C'] = group['A'].tolist()
And I got this. Like the C column is based on A column.
A B C
1 x 1
2 y 2
3 x 3
4 x 4
5 y 5
Can anybody explain to me why this is happening and a solution to do this the way I want? Thanks :)
You can do the aggregation based on column B firstly and then join back with original df on B
:
df
# A B
#0 1 x
#1 2 y
#2 3 x
#3 4 x
#4 5 y
df.groupby('B').A.apply(list).rename('C').reset_index().merge(df)
# B C A
#0 x [1, 3, 4] 1
#1 x [1, 3, 4] 3
#2 x [1, 3, 4] 4
#3 y [2, 5] 2
#4 y [2, 5] 5
You could use transform
to create the lists.
In [324]: df['C'] = df.groupby('B')['A'].transform(lambda x: [x.values])
In [325]: df
Out[325]:
A B C
0 1 x [1, 3, 4]
1 2 y [2, 5]
2 3 x [1, 3, 4]
3 4 x [1, 3, 4]
4 5 y [2, 5]
Sum-thing creative!
Make A
single valued lists. Then do a transform with sum
.
df.assign(
C=pd.Series(
df.A.values[:, None].tolist(), df.index
).groupby(df.B).transform('sum')
)
A B C
0 1 x [1, 3, 4]
1 2 y [2, 5]
2 3 x [1, 3, 4]
3 4 x [1, 3, 4]
4 5 y [2, 5]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With