I get some surprising results when trying to evaluate
logical expressions on data that might contain nan
values (as defined in numpy).
I would like to understand why this results arise and how to implement the correct way.
What I don't understand is why these expressions evaluate to the value they do:
from numpy import nan
nan and True
>>> True
# this is wrong.. I would expect to evaluate to nan
True and nan
>>> nan
# OK
nan and False
>>> False
# OK regardless the value of the first element
# the expression should evaluate to False
False and nan
>>> False
#ok
Similarly for or
:
True or nan
>>> True #OK
nan or True
>>> nan #wrong the expression is True
False or nan
>>> nan #OK
nan or False
>>> nan #OK
How can I implement (in an efficient way) the correct boolean functions, handling also nan
values?
You can use predicates from the numpy
namespace:
>>> np.logical_and(True, np.nan), np.logical_and(False, np.nan)
(True, False)
>>> np.logical_and(np.nan, True), np.logical_and(np.nan, False)
(True, False)
>>>
>>> np.logical_or(True, np.nan), np.logical_or(False, np.nan)
(True, True)
>>> np.logical_or(np.nan, True), np.logical_or(np.nan, False)
(True, True)
EDIT: The built-in boolean operators are slightly different. From the docs :
x and y
is equivalent to if x is false, then x, else y
. So, if the first argument evaluates to False
, they return it (not its boolean equivalent, as it were). Therefore:
>>> (None and True) is None
True
>>> [] and True
[]
>>> [] and False
[]
>>>
etc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With