Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating a column in a Pandas DataFrame based on a condition of another column

Tags:

python

pandas

I'm interested in adding a text tag to a new column in a Pandas dataframe. The following example works but I get the copy warning and I don't fully understand if I should ignore it in this case.

The DataFrame simply has either a character or is an empty string:

In [1]: import pandas as pd

In [2]: df=pd.DataFrame({('A'):['x','','x',''], ('B'):['x','x','','']})

In [3]: df
Out[3]:
   A  B
0  x  x
1     x
2  x
3

Create a new column called 'msg'

In [4]: df['msg'] = ''

In [5]: df
Out[5]:
   A  B msg
0  x  x
1     x
2  x
3

Set the 'msg' column to 'red' if 'A' is not an empty string

In [6]: df['msg'][df['A'] != ''] = 'red;'

In [7]: df
Out[7]:
   A  B  msg
0  x  x  red;
1     x
2  x     red;
3

Concatenate 'blue' depending on the 'B' column values

In [8]: df['msg'][df['B'] != ''] += 'blue;'

In [9]: df
Out[9]:
   A  B       msg
0  x  x  red;blue;
1     x     blue;
2  x         red;
3

Alternatively, I found using numpy.where produced the desired result. What is the proper way to do this in Pandas?

import numpy as np

df['msg'] += np.where(df['A'] != '','green;', '')

Update: 4/15/2018

Upon further thought, it would be useful to retain the data from the original DataFrame in certain cases but still attach a label ('color' in this example). The answer from @COLDSPEED led me to the following (changing 'blue;' to 'blue:' and preserving column 'B' data to include in the tag in this case):

df['msg'] = (v.where(df.applymap(len) > 0, '') + 
             df.where(df[['B']].applymap(len)>0,'')).agg(''.join, axis=1)


   A  B         msg
0  x  x  red;blue:x
1     x      blue:x
2  x           red;
3
like image 1000
Robert Avatar asked Dec 14 '22 17:12

Robert


1 Answers

If you know your colors before-hand, you could use masking with DataFrame.where and str.join to get this done.

v = pd.DataFrame(
     np.repeat([['red;', 'blue;']], len(df), axis=0), 
     columns=df.columns, 
     index=df.index
) 
df['msg'] = v.where(df.applymap(len) > 0, '').agg(''.join, axis=1)

df
   A  B        msg
0  x  x  red;blue;
1     x      blue;
2  x          red;
3              
like image 196
cs95 Avatar answered Dec 29 '22 10:12

cs95