Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Creating new column based on values from existing column

Tags:

python

pandas

I have a pandas dataframe with two columns as following:

A      B
Yes    No
Yes    Yes
No     Yes
No     No
NA     Yes
NA     NA

I want to create a new column based on these values such that if any of the column values are Yes, the value in the new column should also be Yes. If both columns have the value No, the new column would also have the value No. And finally, if both columns has value NA, the output would also have NA for the new column. Example output for above data is:

C
Yes
Yes
Yes
No
Yes
NA

I wrote a loop over the length of dataframe and then checks for each value to get a new column. However, it takes a long time for 10M records. Is there a faster pythonic way to achieve this?

like image 572
Haroon S. Avatar asked Dec 31 '22 03:12

Haroon S.


2 Answers

Something like

df.fillna('').max(axis=1)
Out[106]: 
0    Yes
1    Yes
2    Yes
3     No
4    Yes
5       
dtype: object
like image 156
BENY Avatar answered Jan 13 '23 22:01

BENY


Try:

(df == 'Yes').eval('A | B').astype(str).mask(df['A'].isna() & df['B'].isna())
like image 28
Scott Boston Avatar answered Jan 13 '23 22:01

Scott Boston