Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New column based on multiple conditions ignoring missing values

I have following dataframe with some missing values:

       A     B
0  63.0   9.0
1   NaN  35.0
2  51.0  95.0
3  25.0  11.0
4  91.0   NaN
5   2.0  47.0
6  37.0  10.0
7   NaN  88.0
8  75.0  87.0
9  92.0  21.0

I want to create a new column based on conditions of both above columns:

df['C'] = numpy.where((df['A']>55) | (df['B']>55), "Yes", "No")

This works but does not take into account missing values:

      A     B    C
0  63.0   9.0  Yes
1   NaN  35.0   No
2  51.0  95.0  Yes
3  25.0  11.0   No
4  91.0   NaN  Yes
5   2.0  47.0   No
6  37.0  10.0   No
7   NaN  88.0  Yes
8  75.0  87.0  Yes
9  92.0  21.0  Yes

For correcting for missing values, I have to run following code:

df['C'] = numpy.where((df['A'].isnull()) | (df['B'].isnull()), numpy.nan, df['C'])

Then I get proper new column:

      A     B    C
0  63.0   9.0  Yes
1   NaN  35.0  NaN
2  51.0  95.0  Yes
3  25.0  11.0   No
4  91.0   NaN  NaN
5   2.0  47.0   No
6  37.0  10.0   No
7   NaN  88.0  NaN
8  75.0  87.0  Yes
9  92.0  21.0  Yes

Is there any better way so that I can correct for missing values in single line of code only?

like image 989
rnso Avatar asked Mar 02 '26 06:03

rnso


1 Answers

Using np.select. The order of the conditions is important here, as np.select chooses the first valid option, so you must have your null check first.

c1 = df.isnull().any(1)
c2 = df.gt(55).any(1)

df['C'] = np.select([c1, c2], [np.nan, 'Yes'], 'No')

      A     B    C
0  63.0   9.0  Yes
1   NaN  35.0  nan
2  51.0  95.0  Yes
3  25.0  11.0   No
4  91.0   NaN  nan
5   2.0  47.0   No
6  37.0  10.0   No
7   NaN  88.0  nan
8  75.0  87.0  Yes
9  92.0  21.0  Yes
like image 149
user3483203 Avatar answered Mar 04 '26 20:03

user3483203



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!