Python's 'set' operator doesn't work with numpy.nan

Question

I noticed a problem converting lists of NaN values to sets:

import pandas as pd
import numpy as np

x = pd.DataFrame({'a':[None,None]})
x_numeric = pd.to_numeric(x['a']) #converts to numpy.float64
set(x_numeric)

This SHOULD return {nan} but instead returns {nan, nan}. However, doing this:

set([numpy.nan, numpy.nan])

returns the expected {nan}. The former are apparently class numpy.float64, while the latter are by default class float.

Any idea why set() doesn't work with numpy.float64 NaN values? I'm using Pandas version 0.18 and Numpy version 1.10.4.

Andy Hayden · Accepted Answer

NaNs in a float64 array don't point to the same space in memory as np.NaN, (they, like every other number in the array, 8 bytes in the array). We can see this when we take the id:

In [11]: x_numeric
Out[11]:
0   NaN
1   NaN
Name: a, dtype: float64

In [12]: x_numeric.apply(id)
Out[12]:
0    4657312584
1    4657312536
Name: a, dtype: int64

In [13]: id(np.nan)
Out[13]: 4535176264

In [14]: id(np.nan)
Out[14]: 4535176264

It's kindof a python "gotcha" that this occurs, since it's an optimization (before checking set equality python checks if it's the same object: has the same id / location in memory):

In [21]: s = set([np.nan])

In [22]: np.nan in s
Out[22]: True

In [23]: x_numeric.apply(lambda x: x in s)
Out[23]:
0    False
1    False
Name: a, dtype: bool

The reason it's a "gotcha" is because NaN, unlike most objects is not equal to itself:

In [24]: np.nan == np.nan
Out[24]: False

Python's 'set' operator doesn't work with numpy.nan

Tags:

python

pandas

nan

numpy

tom

1 Answers

Andy Hayden

Recent Activity

Donate For Us

Python's 'set' operator doesn't work with numpy.nan

Tags:

python

pandas

nan

numpy

tom

1 Answers

Andy Hayden

Related questions

Recent Activity

Donate For Us