Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preserve NaN values in pandas boolean comparisons

I have two boolean columns A and B in a pandas dataframe, each with missing data (represented by NaN). What I want is to do an AND operation on the two columns, but I want the resulting boolean column to be NaN if either of the original columns is NaN. I have the following table:

    A      B
0   True   True    
1   True   False   
2   False  True   
3   True   NaN    
4   NaN    NaN
5   NaN    False

Now when I do df.A & df.B I want:

0    True
1    False
2    False
3    NaN
4    NaN
5    False
dtype: bool

but instead I get:

0    True
1    False
2    False
3    True
4    True
5    False
dtype: bool

This behaviour is consistent with np.bool(np.nan) & np.bool(False) and its permutations, but what I really want is a column that tells me for certain if each row is True for both, or for certain could not be True for both. If I know it is True for both, then the result should be True, if I know that it is False for at least one then it should be False, and otherwise I need NaN to show that the datum is missing.

Is there a way to achieve this?

like image 534
Will Bryant Avatar asked Jun 27 '17 11:06

Will Bryant


Video Answer


2 Answers

pandas >= 1.0

This operation is directly supported by pandas provided you are using the new Nullable Boolean Type boolean (not to be confused with the traditional numpy bool type).

# Setup
df = pd.DataFrame({'A':[True, True, False, True, np.nan, np.nan], 
                   'B':[True, False, True, np.nan, np.nan, False]})

df.dtypes                                                                  

A    object
B    object
dtype: object
# A little shortcut to convert the data type to `boolean`
df2 = df.convert_dtypes()                                                  
df2.dtypes                                                                 

A    boolean
B    boolean
dtype: object

df2['A'] & df2['B']                                                        

0     True
1    False
2    False
3     <NA>
4     <NA>
5    False
dtype: boolean

In conclusion, please consider upgrading to pandas 1.0 :-)

like image 178
cs95 Avatar answered Oct 26 '22 13:10

cs95


Let's use np.logical_and:

import numpy as np
import pandas as pd
df = pd.DataFrame({'A':[True, True, False, True, np.nan, np.nan], 
                   'B':[True, False, True, np.nan, np.nan, False]})

s = np.logical_and(df['A'],df['B'])
print(s)

Output:

0     True
1    False
2    False
3      NaN
4      NaN
5    False
Name: A, dtype: object
like image 22
Scott Boston Avatar answered Oct 26 '22 11:10

Scott Boston