In Python Pandas and Numpy, why is the comparison result different? <pre class="prettyprint"><code>from pandas import Series from numpy import NaN </code></pre> <code>NaN</code> is not equal to <code>NaN</code> <pre class="prettyprint"><code>>>> NaN == NaN False </code></pre> but <code>NaN</code> inside a list or tuple is <pre class="prettyprint"><code>>>> [NaN] == [NaN], (NaN,) == (NaN,) (True, True) </code></pre> While <code>Series</code> with <code>NaN</code> are not equal again: <pre class="prettyprint"><code>>>> Series([NaN]) == Series([NaN]) 0 False dtype: bool </code></pre> And <code>None</code>: <pre class="prettyprint"><code>>>> None == None, [None] == [None] (True, True) </code></pre> While <pre class="prettyprint"><code>>>> Series([None]) == Series([None]) 0 False dtype: bool </code></pre> This answer explains the reasons for <code>NaN == NaN</code> being <code>False</code> in general, but does not explain its behaviour in python/pandas collections.

As explained here, and here and in python docs to check sequence equality <blockquote> element identity is compared first, and element comparison is performed only for distinct elements. </blockquote> Because <code>np.nan</code> and <code>np.NaN</code> refer to the same object i.e. <code>(np.nan is np.nan is np.NaN) == True</code> this equality holds <code>[np.nan] == [np.nan]</code>, but on the other hand <code>float('nan')</code> function creates a new object on every call so <code>[float('nan')] == [float('nan')]</code> is <code>False</code>. Pandas/Numpy do not have this problem: <pre class="prettyprint"><code>>>> pd.Series([np.NaN]).eq(pd.Series([np.NaN]))[0], (pd.Series([np.NaN]) == pd.Series([np.NaN]))[0] (False, False) </code></pre> Although special equals method treats <code>NaN</code>s in the same location as equals. <pre class="prettyprint"><code>>>> pd.Series([np.NaN]).equals(pd.Series([np.NaN])) True </code></pre> <code>None</code> is treated differently. <code>numpy</code> considers them equal: <pre class="prettyprint"><code>>>> pd.Series([None, None]).values == (pd.Series([None, None])).values array([ True, True]) </code></pre> While <code>pandas</code> does not <pre class="prettyprint"><code>>>> pd.Series([None, None]) == (pd.Series([None, None])) 0 False 1 False dtype: bool </code></pre> Also there is an inconsistency between <code>==</code> operator and <code>eq</code> method, which is discussed here: <pre class="prettyprint"><code>>>> pd.Series([None, None]).eq(pd.Series([None, None])) 0 True 1 True dtype: bool </code></pre> Tested on <code>pandas: 0.23.4 numpy: 1.15.0</code>

Pandas/Numpy NaN None comparison

Tags:

python

python-3.x

pandas

nan

nonetype

In Python Pandas and Numpy, why is the comparison result different?

from pandas import Series
from numpy import NaN

NaN is not equal to NaN

>>> NaN == NaN
False

but NaN inside a list or tuple is

>>> [NaN] == [NaN], (NaN,) == (NaN,)
(True, True)

While Series with NaN are not equal again:

>>> Series([NaN]) == Series([NaN])
0    False
dtype: bool

And None:

>>> None == None, [None] == [None]
(True, True)

While

>>> Series([None]) == Series([None])
0    False
dtype: bool

This answer explains the reasons for NaN == NaN being False in general, but does not explain its behaviour in python/pandas collections.

802

asked Sep 21 '18 03:09

Chas

1 Answers

As explained here, and here and in python docs to check sequence equality

element identity is compared first, and element comparison is performed only for distinct elements.

Because np.nan and np.NaN refer to the same object i.e. (np.nan is np.nan is np.NaN) == True this equality holds [np.nan] == [np.nan], but on the other hand float('nan') function creates a new object on every call so [float('nan')] == [float('nan')] is False.

Pandas/Numpy do not have this problem:

>>> pd.Series([np.NaN]).eq(pd.Series([np.NaN]))[0], (pd.Series([np.NaN]) == pd.Series([np.NaN]))[0]
(False, False)

Although special equals method treats NaNs in the same location as equals.

>>> pd.Series([np.NaN]).equals(pd.Series([np.NaN]))
True

None is treated differently. numpy considers them equal:

>>> pd.Series([None, None]).values == (pd.Series([None, None])).values
array([ True,  True])

While pandas does not

>>> pd.Series([None, None]) == (pd.Series([None, None]))
0    False
1    False
dtype: bool

Also there is an inconsistency between == operator and eq method, which is discussed here:

>>> pd.Series([None, None]).eq(pd.Series([None, None]))
0    True
1    True
dtype: bool

Tested on pandas: 0.23.4 numpy: 1.15.0

122

answered Oct 03 '22 13:10

hellpanderr

Related questions
                            
                                python rq - how to trigger a job when multiple other jobs are finished? Multi job dependency work arround?
                            
                                Is this time series stationary or not?
                            
                                Python lexical analysis - logical line & compound statements
                            
                                Python (openpyxl) : Put data from one excel file to another (template file) & save it with another name while retaining the template
                            
                                which one is effecient, join queries using sql, or merge queries using pandas?
                            
                                Pandas DataFrame column numerical integration
                            
                                Numpy: find indeces of mask edges
                            
                                How can I unlink account between Actions on Google and Auth0
                            
                                Python Seasonal decompose Freq paramater determination
                            
                                Python cprofile filename:lineno get full path
                            
                                Sudo in Fabric2
                            
                                How to use Numba to perform multiple integration in SciPy with an arbitrary number of variables and parameters?
                            
                                Logically sorting a list of lists (partially ordered set -> topological sort)
                            
                                Quantile random forests from scikit-garden very slow at making predictions
                            
                                ImportError: No module named 'tensorflow.core'
                            
                                Pandas resample timeseries data to 15 mins and 45 mins - using multi-index or column
                            
                                Python dataclasses: What type to use if __post_init__ performs type conversion?
                            
                                Error when checking target: expected dense_3 to have shape (2,) but got array with shape (1,)
                            
                                Tensorflow: Tensor to numpy array conversion without running any session
                            
                                Python, speech_recognition tool does not recognize .wav file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With