New column based on multiple conditions ignoring missing values

Question

I have following dataframe with some missing values:

       A     B
0  63.0   9.0
1   NaN  35.0
2  51.0  95.0
3  25.0  11.0
4  91.0   NaN
5   2.0  47.0
6  37.0  10.0
7   NaN  88.0
8  75.0  87.0
9  92.0  21.0

I want to create a new column based on conditions of both above columns:

df['C'] = numpy.where((df['A']>55) | (df['B']>55), "Yes", "No")

This works but does not take into account missing values:

      A     B    C
0  63.0   9.0  Yes
1   NaN  35.0   No
2  51.0  95.0  Yes
3  25.0  11.0   No
4  91.0   NaN  Yes
5   2.0  47.0   No
6  37.0  10.0   No
7   NaN  88.0  Yes
8  75.0  87.0  Yes
9  92.0  21.0  Yes

For correcting for missing values, I have to run following code:

df['C'] = numpy.where((df['A'].isnull()) | (df['B'].isnull()), numpy.nan, df['C'])

Then I get proper new column:

      A     B    C
0  63.0   9.0  Yes
1   NaN  35.0  NaN
2  51.0  95.0  Yes
3  25.0  11.0   No
4  91.0   NaN  NaN
5   2.0  47.0   No
6  37.0  10.0   No
7   NaN  88.0  NaN
8  75.0  87.0  Yes
9  92.0  21.0  Yes

Is there any better way so that I can correct for missing values in single line of code only?

user3483203 · Accepted Answer

Using np.select. The order of the conditions is important here, as np.select chooses the first valid option, so you must have your null check first.

c1 = df.isnull().any(1)
c2 = df.gt(55).any(1)

df['C'] = np.select([c1, c2], [np.nan, 'Yes'], 'No')

      A     B    C
0  63.0   9.0  Yes
1   NaN  35.0  nan
2  51.0  95.0  Yes
3  25.0  11.0   No
4  91.0   NaN  nan
5   2.0  47.0   No
6  37.0  10.0   No
7   NaN  88.0  nan
8  75.0  87.0  Yes
9  92.0  21.0  Yes

New column based on multiple conditions ignoring missing values

Tags:

python

pandas

numpy

rnso

1 Answers

user3483203

Recent Activity

Donate For Us

New column based on multiple conditions ignoring missing values

Tags:

python

pandas

numpy

rnso

1 Answers

user3483203

Related questions

Recent Activity

Donate For Us