Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's 'set' operator doesn't work with numpy.nan

I noticed a problem converting lists of NaN values to sets:

import pandas as pd
import numpy as np

x = pd.DataFrame({'a':[None,None]})
x_numeric = pd.to_numeric(x['a']) #converts to numpy.float64
set(x_numeric)

This SHOULD return {nan} but instead returns {nan, nan}. However, doing this:

set([numpy.nan, numpy.nan])

returns the expected {nan}. The former are apparently class numpy.float64, while the latter are by default class float.

Any idea why set() doesn't work with numpy.float64 NaN values? I'm using Pandas version 0.18 and Numpy version 1.10.4.

like image 232
tom Avatar asked Jan 04 '23 07:01

tom


1 Answers

NaNs in a float64 array don't point to the same space in memory as np.NaN, (they, like every other number in the array, 8 bytes in the array). We can see this when we take the id:

In [11]: x_numeric
Out[11]:
0   NaN
1   NaN
Name: a, dtype: float64

In [12]: x_numeric.apply(id)
Out[12]:
0    4657312584
1    4657312536
Name: a, dtype: int64

In [13]: id(np.nan)
Out[13]: 4535176264

In [14]: id(np.nan)
Out[14]: 4535176264

It's kindof a python "gotcha" that this occurs, since it's an optimization (before checking set equality python checks if it's the same object: has the same id / location in memory):

In [21]: s = set([np.nan])

In [22]: np.nan in s
Out[22]: True

In [23]: x_numeric.apply(lambda x: x in s)
Out[23]:
0    False
1    False
Name: a, dtype: bool

The reason it's a "gotcha" is because NaN, unlike most objects is not equal to itself:

In [24]: np.nan == np.nan
Out[24]: False
like image 102
Andy Hayden Avatar answered Jan 15 '23 08:01

Andy Hayden