I have two boolean columns A and B in a pandas dataframe, each with missing data (represented by NaN). What I want is to do an AND operation on the two columns, but I want the resulting boolean column to be NaN if either of the original columns is NaN. I have the following table:
A B
0 True True
1 True False
2 False True
3 True NaN
4 NaN NaN
5 NaN False
Now when I do df.A & df.B
I want:
0 True
1 False
2 False
3 NaN
4 NaN
5 False
dtype: bool
but instead I get:
0 True
1 False
2 False
3 True
4 True
5 False
dtype: bool
This behaviour is consistent with np.bool(np.nan) & np.bool(False)
and its permutations, but what I really want is a column that tells me for certain if each row is True for both, or for certain could not be True for both. If I know it is True for both, then the result should be True, if I know that it is False for at least one then it should be False, and otherwise I need NaN to show that the datum is missing.
Is there a way to achieve this?
This operation is directly supported by pandas provided you are using the new Nullable Boolean Type boolean
(not to be confused with the traditional numpy bool
type).
# Setup
df = pd.DataFrame({'A':[True, True, False, True, np.nan, np.nan],
'B':[True, False, True, np.nan, np.nan, False]})
df.dtypes
A object
B object
dtype: object
# A little shortcut to convert the data type to `boolean`
df2 = df.convert_dtypes()
df2.dtypes
A boolean
B boolean
dtype: object
df2['A'] & df2['B']
0 True
1 False
2 False
3 <NA>
4 <NA>
5 False
dtype: boolean
In conclusion, please consider upgrading to pandas 1.0 :-)
Let's use np.logical_and
:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':[True, True, False, True, np.nan, np.nan],
'B':[True, False, True, np.nan, np.nan, False]})
s = np.logical_and(df['A'],df['B'])
print(s)
Output:
0 True
1 False
2 False
3 NaN
4 NaN
5 False
Name: A, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With