Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's and Numpy's nan and set

I ran into an unpredicted behavior with Python's Numpy, set and NaN (not-a-number):

>>> set([np.float64('nan'), np.float64('nan')])
set([nan, nan])
>>> set([np.float32('nan'), np.float32('nan')])
set([nan, nan])
>>> set([np.float('nan'), np.float('nan')])
set([nan, nan])
>>> set([np.nan, np.nan])
set([nan])
>>> set([float('nan'), float('nan')])
set([nan, nan])

Here np.nan yields a single element set, while Numpy's nans yield multiple nans in a set. So does float('nan')! And note that:

>>> type(float('nan')) == type(np.nan)
True

I wonder how this difference come about and what the rationality is behind the different behaviors.

like image 306
Finn Årup Nielsen Avatar asked Apr 09 '15 17:04

Finn Årup Nielsen


1 Answers

One of the properties of NAN is that NAN != NAN, unlike all other numbers. However, the implementation of set first checks to see if id(x) matches the existing member at a hash index before it tries to insert a new one. If you have two objects with different ids that both have the value NAN, you'll get two entries in the set. If they both have the same id, they collapse into a single entry.

As pointed out by others, np.nan is a single object that will always have the same id.

like image 189
Mark Ransom Avatar answered Oct 13 '22 01:10

Mark Ransom