I have following dataframe with some missing values:
A B
0 63.0 9.0
1 NaN 35.0
2 51.0 95.0
3 25.0 11.0
4 91.0 NaN
5 2.0 47.0
6 37.0 10.0
7 NaN 88.0
8 75.0 87.0
9 92.0 21.0
I want to create a new column based on conditions of both above columns:
df['C'] = numpy.where((df['A']>55) | (df['B']>55), "Yes", "No")
This works but does not take into account missing values:
A B C
0 63.0 9.0 Yes
1 NaN 35.0 No
2 51.0 95.0 Yes
3 25.0 11.0 No
4 91.0 NaN Yes
5 2.0 47.0 No
6 37.0 10.0 No
7 NaN 88.0 Yes
8 75.0 87.0 Yes
9 92.0 21.0 Yes
For correcting for missing values, I have to run following code:
df['C'] = numpy.where((df['A'].isnull()) | (df['B'].isnull()), numpy.nan, df['C'])
Then I get proper new column:
A B C
0 63.0 9.0 Yes
1 NaN 35.0 NaN
2 51.0 95.0 Yes
3 25.0 11.0 No
4 91.0 NaN NaN
5 2.0 47.0 No
6 37.0 10.0 No
7 NaN 88.0 NaN
8 75.0 87.0 Yes
9 92.0 21.0 Yes
Is there any better way so that I can correct for missing values in single line of code only?
Using np.select. The order of the conditions is important here, as np.select chooses the first valid option, so you must have your null check first.
c1 = df.isnull().any(1)
c2 = df.gt(55).any(1)
df['C'] = np.select([c1, c2], [np.nan, 'Yes'], 'No')
A B C
0 63.0 9.0 Yes
1 NaN 35.0 nan
2 51.0 95.0 Yes
3 25.0 11.0 No
4 91.0 NaN nan
5 2.0 47.0 No
6 37.0 10.0 No
7 NaN 88.0 nan
8 75.0 87.0 Yes
9 92.0 21.0 Yes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With