Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas/Numpy NaN None comparison

In Python Pandas and Numpy, why is the comparison result different?

from pandas import Series
from numpy import NaN

NaN is not equal to NaN

>>> NaN == NaN
False

but NaN inside a list or tuple is

>>> [NaN] == [NaN], (NaN,) == (NaN,)
(True, True)

While Series with NaN are not equal again:

>>> Series([NaN]) == Series([NaN])
0    False
dtype: bool

And None:

>>> None == None, [None] == [None]
(True, True)

While

>>> Series([None]) == Series([None])
0    False
dtype: bool 

This answer explains the reasons for NaN == NaN being False in general, but does not explain its behaviour in python/pandas collections.

like image 802
Chas Avatar asked Sep 21 '18 03:09

Chas


People also ask

What is the difference between None and NaN in pandas?

NaN can be used as a numerical value on mathematical operations, while None cannot (or at least shouldn't). NaN is a numeric value, as defined in IEEE 754 floating-point standard. None is an internal Python type ( NoneType ) and would be more like "inexistent" or "empty" than "numerically invalid" in this context.

Is NaN in pandas None?

None is also considered a missing value In pandas, None is also treated as a missing value. None is a built-in constant in Python. For numeric columns, None is converted to nan when a DataFrame or Series containing None is created, or None is assigned to an element.

Is NP NaN == NP NaN?

nan is NOT equal to nan At first, reading that np. nan == np. nan is False can trigger a reaction of confusion and frustration.

Is NaN and null same in pandas?

Pandas treat None and NaN as essentially interchangeable for indicating missing or null values. To facilitate this convention, there are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame : isnull() notnull()


1 Answers

As explained here, and here and in python docs to check sequence equality

element identity is compared first, and element comparison is performed only for distinct elements.

Because np.nan and np.NaN refer to the same object i.e. (np.nan is np.nan is np.NaN) == True this equality holds [np.nan] == [np.nan], but on the other hand float('nan') function creates a new object on every call so [float('nan')] == [float('nan')] is False.

Pandas/Numpy do not have this problem:

>>> pd.Series([np.NaN]).eq(pd.Series([np.NaN]))[0], (pd.Series([np.NaN]) == pd.Series([np.NaN]))[0]
(False, False)

Although special equals method treats NaNs in the same location as equals.

>>> pd.Series([np.NaN]).equals(pd.Series([np.NaN]))
True

None is treated differently. numpy considers them equal:

>>> pd.Series([None, None]).values == (pd.Series([None, None])).values
array([ True,  True])

While pandas does not

>>> pd.Series([None, None]) == (pd.Series([None, None]))
0    False
1    False
dtype: bool

Also there is an inconsistency between == operator and eq method, which is discussed here:

>>> pd.Series([None, None]).eq(pd.Series([None, None]))
0    True
1    True
dtype: bool

Tested on pandas: 0.23.4 numpy: 1.15.0

like image 122
hellpanderr Avatar answered Oct 03 '22 13:10

hellpanderr