Pandas update column with array

Question

So, I'm learning pandas and I have this problem.

Suppose I have a Dataframe like this:

A B C
1 x NaN
2 y NaN
3 x NaN
4 x NaN
5 y NaN

I'm trying to create this:

A B C
1 x [1,3,4]
2 y [2,5]
3 x [1,3,4]
4 x [1,3,4]
5 y [2,5]

Based on B similarities.

I did this:

teste = df.groupby(['B'])
for name,group in teste:
    df.loc[df['B'] == name[0],'C'] = group['A'].tolist()

And I got this. Like the C column is based on A column.

A B C
1 x 1
2 y 2
3 x 3
4 x 4
5 y 5

Can anybody explain to me why this is happening and a solution to do this the way I want? Thanks :)

Psidom · Accepted Answer

You can do the aggregation based on column B firstly and then join back with original df on B:

df
#   A   B
#0  1   x
#1  2   y
#2  3   x
#3  4   x
#4  5   y

df.groupby('B').A.apply(list).rename('C').reset_index().merge(df)

#   B           C   A
#0  x   [1, 3, 4]   1
#1  x   [1, 3, 4]   3
#2  x   [1, 3, 4]   4
#3  y      [2, 5]   2
#4  y      [2, 5]   5

Zero · Answer

You could use transform to create the lists.

In [324]: df['C'] = df.groupby('B')['A'].transform(lambda x: [x.values])

In [325]: df
Out[325]:
   A  B          C
0  1  x  [1, 3, 4]
1  2  y     [2, 5]
2  3  x  [1, 3, 4]
3  4  x  [1, 3, 4]
4  5  y     [2, 5]

piRSquared · Answer

Sum-thing creative!
Make A single valued lists. Then do a transform with sum.

df.assign(
    C=pd.Series(
        df.A.values[:, None].tolist(), df.index
    ).groupby(df.B).transform('sum')
)

   A  B          C
0  1  x  [1, 3, 4]
1  2  y     [2, 5]
2  3  x  [1, 3, 4]
3  4  x  [1, 3, 4]
4  5  y     [2, 5]

Pandas update column with array

Tags:

python

pandas

dataframe

pandas-groupby

Artur Barbosa

3 Answers

Psidom

Zero

piRSquared

Recent Activity

Donate For Us

Pandas update column with array

Tags:

python

pandas

dataframe

pandas-groupby

Artur Barbosa

3 Answers

Psidom

Zero

piRSquared

Related questions

Recent Activity

Donate For Us