I am using pandas.eval
on a boolean series with missing data.
To do this I use an indexer to mark non-null values and .loc
to only apply .eval
on the rows with non-missing data.
Applying the logical not operator using the expression ~bool
or not(bool)
returns -1 or -2.
I understand that this is because my boolean series is casted as object type because of the missing values, but I am wondering :
.eval
on a boolean series with missing data ?Here is a reproducible example using pandas 0.20.3.
df = pd.DataFrame({'bool': [True, False, None]})
bool
0 True
1 False
2 None
indexer = ~pd.isnull(df['bool'])
0 True
1 True
2 False
Name: bool, dtype: bool
df.loc[indexer].eval('~bool')
0 -2
1 -1
Name: bool, dtype: object
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
fillna() method is used to replace missing values with a specified value. This method replaces the Nan or NA values in the entire series object. Value − it allows us to specify a particular value to replace Nan's, by default it takes None.
isna. Detect missing values for an array-like object. This function takes a scalar or array-like object and indicates whether values are missing ( NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).
For eval
, ~
maps to op.invert
as seen in the source code here.
_unary_ops_syms = '+', '-', '~', 'not' _unary_ops_funcs = op.pos, op.neg, op.invert, op.invert _unary_ops_dict = dict(zip(_unary_ops_syms, _unary_ops_funcs))
Thus when your Series is of good old object
type, what you're seeing here is
>>> ~True
-2
>>> ~False
-1
# or with your Series
>>> ~pd.Series(True, dtype='object')
0 -2
dtype: object
Where you want
>>> ~pd.Series(True)
0 False
dtype: bool
The outputs ~True -> -2
and ~False -> -1
are because bool
is a subclass of int
in Python, and -2, -1 are the bitwise complements of 1 and 0 respectively.
The obvious solution is to either convert the Series to bool
type beforehand with astype(bool)
in an extra setp, or if for some reason you cannot do so before the eval
,
>>> df.loc[indexer].eval('~bool.astype("bool")')
0 False
1 True
Name: bool, dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With