I noticed a problem converting lists of NaN values to sets:
import pandas as pd
import numpy as np
x = pd.DataFrame({'a':[None,None]})
x_numeric = pd.to_numeric(x['a']) #converts to numpy.float64
set(x_numeric)
This SHOULD return {nan} but instead returns {nan, nan}. However, doing this:
set([numpy.nan, numpy.nan])
returns the expected {nan}. The former are apparently class numpy.float64, while the latter are by default class float.
Any idea why set() doesn't work with numpy.float64 NaN values? I'm using Pandas version 0.18 and Numpy version 1.10.4.
NaNs in a float64 array don't point to the same space in memory as np.NaN, (they, like every other number in the array, 8 bytes in the array). We can see this when we take the id
:
In [11]: x_numeric
Out[11]:
0 NaN
1 NaN
Name: a, dtype: float64
In [12]: x_numeric.apply(id)
Out[12]:
0 4657312584
1 4657312536
Name: a, dtype: int64
In [13]: id(np.nan)
Out[13]: 4535176264
In [14]: id(np.nan)
Out[14]: 4535176264
It's kindof a python "gotcha" that this occurs, since it's an optimization (before checking set equality python checks if it's the same object: has the same id
/ location in memory):
In [21]: s = set([np.nan])
In [22]: np.nan in s
Out[22]: True
In [23]: x_numeric.apply(lambda x: x in s)
Out[23]:
0 False
1 False
Name: a, dtype: bool
The reason it's a "gotcha" is because NaN, unlike most objects is not equal to itself:
In [24]: np.nan == np.nan
Out[24]: False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With