Why does the following code return False
?
>>> pd.Series([np.nan]) | pd.Series([True])
0 False
dtype: bool
Compare your case (with the explicit dtype
to emphasize the inferred one):
In[11]: pd.Series([np.nan], dtype=float) | pd.Series([True])
Out[11]: 0 False dtype: bool
with a similar one (only dtype
is now bool
):
In[12]: pd.Series([np.nan], dtype=bool) | pd.Series([True])
Out[12]: 0 True dtype: bool
Do you see the difference?
The explanation:
In the first case (yours), np.nan
propagates itself in the logical operation or
(under the hood)
In[13]: np.nan or True
Out[13]: nan
and pandas treated np.nan
as False
in the context of an boolean operation result.
In the second case the output is unambiguous, as the first series has a boolean value (True
, as all non-zero values are considered True
, including np.nan
, but it doesn't matter in this case):
In[14]: pd.Series([np.nan], dtype=bool)
Out[14]: 0 True dtype: bool
and True or True
gives True
, of course:
In[15]: True or True
Out[15]: True
I think this is because np.nan
has metaclass of float
and I guess overrides __bool__
to be non-zero:
np.nan.__bool__() == True
In the same way:
>>>np.nan or None
nan
A solution in pandas would be:
pd.Series([np.nan]).fillna(False) | pd.Series([True])
EDIT ***
For clarity, in pandas 0.24.1
in the method: _bool_method_SERIES
on line 1816
of .../pandas/core/ops.py
there is an assignment:
fill_bool = lambda x: x.fillna(False).astype(bool)
which is where the behaviour you are describing is coming from. I.e. it's been purposefully designed so that np.nan
is treated like a False
value (whenever doing an or operation)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With