I have a pandas dataframe with two columns as following:
A B
Yes No
Yes Yes
No Yes
No No
NA Yes
NA NA
I want to create a new column based on these values such that if any of the column values are Yes
, the value in the new column should also be Yes
. If both columns have the value No
, the new column would also have the value No
. And finally, if both columns has value NA
, the output would also have NA
for the new column. Example output for above data is:
C
Yes
Yes
Yes
No
Yes
NA
I wrote a loop over the length of dataframe and then checks for each value to get a new column. However, it takes a long time for 10M records. Is there a faster pythonic way to achieve this?
Something like
df.fillna('').max(axis=1)
Out[106]:
0 Yes
1 Yes
2 Yes
3 No
4 Yes
5
dtype: object
Try:
(df == 'Yes').eval('A | B').astype(str).mask(df['A'].isna() & df['B'].isna())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With