Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas "None | True" return False when Python "None or True" returns True?

In pure Python, None or True returns True.
However with pandas when I'm doing a | between two Series containing None values, results are not as I expected:

>>> df.to_dict()
{'buybox': {0: None}, 'buybox_y': {0: True}}
>>> df
    buybox  buybox_y
0   None    True

>>> df['buybox'] = (df['buybox'] | df['buybox_y'])
>>> df
    buybox  buybox_y
0   False   True

Expected result:

>>> df
    buybox  buybox_y
0   True    True

I get the result I want by applying the OR operation twice, but I don't get why I should do this.

I'm not looking for a workaround (I have it by applying df['buybox'] = (df['buybox'] | df['buybox_y']) twice in a row) but an explanation, thus the 'why' in the title.

like image 483
politinsa Avatar asked Apr 06 '21 14:04

politinsa


1 Answers

Pandas | operator does not rely on Python or expression, and behaves differently.

If both operands are boolean, the result is mathematically defined, and the same for Python and Pandas.

But in your case series "buybox" is of type object, and "buybox_y" is bool. In this case Pandas | operator is not commutative:

  • right operand is coerced to boolean
  • then bitwise or is attempted
    • None | True is invalid operation, resulting in None
  • and result is coerced to boolean

Thus,

>>> df['buybox'] | df['buybox_y']
0  False

>>> df['buybox_y'] | df['buybox']
0  True

For predictable results, you can clean up data, and cast to boolean type with Pandas astype before attempting boolean operations.

like image 171
paiv Avatar answered Nov 03 '22 07:11

paiv