This puzzles me: <pre class="prettyprint"><code>``` a=np.array([1,2,np.nan,3]) # an array with a nan print(np.isnan(a)[2]) # it truly is a nan print(a[2]) # it quacks like a nan print(np.nan is np.nan) # nan's can be compared print(a[2] is np.nan) # But then, this isn't a nan after all!!?? >>> True >>> nan >>> True >>> False ``` </code></pre> I know we're not allowed to compare nan's with <code>==</code>, but with <code>is</code> should be allowed? After all it works when comparing nan to itself? Thanks for any hints to what is going on here.

This isn't so much a question about the Python <code>is</code> operator, as about what indexing, or unboxing, an element of an array does: <pre class="prettyprint"><code>In [363]: a=np.array([1,2,np.nan,3]) In [364]: a[2] Out[364]: nan In [365]: type(a[2]) Out[365]: numpy.float64 In [366]: a[2] is a[2] Out[366]: False </code></pre> <code>a[2]</code> doesn't simply return <code>nan</code>. It returns a <code>np.float64</code> object whose values is <code>np.nan</code>. Another <code>a[2]</code> will produce another <code>np.float64</code> object. Two such objects don't match in the <code>is</code> sense. That's true for any array element, not just <code>nan</code> values. Since <code>==</code> doesn't work for <code>nan</code>, we are stuck with using the <code>np.isnan</code> function. <code>np.nan</code> is a unique <code>float</code> object (in this session), but <code>a[2]</code> is not set to that object. If the array was defined as an object type: <pre class="prettyprint"><code>In [376]: b=np.array([1,2,np.nan,3], object) In [377]: b[2] is np.nan Out[377]: True </code></pre> here the <code>is</code> is True - because <code>b</code> contains pointers to objects that already exist in memory, including the <code>np.nan</code> object. Same would be true for a list constructed like that.

First, at least in NumPy 1.15, <code>np.nan</code> happens to be a special singleton, meaning that whenever NumPy has to give you a NaN value of type <code>float</code>, it tries to give you the same <code>np.nan</code> value. But this is not documented anywhere, or guaranteed to be true across versions. This fits into the larger class of values that may or may not be singletons, as an implementation detail. As a general rule, if your code relies on two equal values of an immutable type being identical or not being identical, your code is wrong. Here are some examples from a default build of CPython 3.7: <pre class="prettyprint"><code>>>> a, b = 200, 201 >>> a is b-1 True >>> a, b = 300, 301 >>> a is b-1 False >>> 301-1 is 300 True >>> math.nan is math.nan True >>> float('nan') is math.nan False >>> float('nan') is float('nan') False </code></pre> You can learn all of the rules that make all of these things come out that way, but they could all change in a different Python implementation, or in version 3.8, or even in 3.7 built with custom configure options. So, just never <code>1</code> or <code>math.nan</code> or <code>np.nan</code> or <code>''</code> with <code>is</code>; only use it for objects that are specifically documented to be singletons (like <code>None</code>—or instances of your own types, of course). <hr> Second, when you index a numpy array, it has to "unbox" the value by constructing a scalar, of a type appropriate to the array's <code>dtype</code>. For a <code>dtype=float64</code> array, the scalar value it constructs is a <code>np.float64</code>. So, <code>a[2]</code> is guaranteed to be a <code>np.float64</code>. But <code>np.nan</code> is not an <code>np.float64</code>, it's a <code>float</code>. So, there's no way NumPy can give you <code>np.nan</code> when you ask for <code>a[2]</code>. Instead, it gives you an <code>np.float64</code> with a NaN value. <hr> OK, so that's why <code>a[2] is np.nan</code> is always False. But why is <code>a[2] is a[2]</code> also usually false? As I mentioned above, NumPy tries to give you <code>np.nan</code> whenever it needs to give you a <code>float</code> NaN. But—at least in 1.15—it doesn't have any special singleton value to provide whenever it needs to give you a <code>np.float64</code> NaN. There's no reason it couldn't, but nobody bothered to write such code, because it shouldn't matter either way to any properly-written app. So, each time you unbox the value in <code>a[2]</code> into a scalar <code>np.float64</code>, it gives you a new NaN-valued <code>np.float64</code>. But why isn't this the same as <code>301-1 is 300</code>? Well, the reason that works is that the compiler is allowed to fold constants of known immutable type with equal values, and CPython does exactly that, for simple cases, within each compilation unit. But two NaN values aren't equal; a NaN value isn't even equal to itself. So, it can't be constant-folded. (If you're wondering what happens if you create an array with an int dtype and store small values in it and check whether they get merged into the small-int singletons—try it and see.) <hr> And of course this is why <code>isnan</code> exists in the first place. You can't test for NaN with equality (because NaN values are not equal to anything, even themselves), you can't test for NaN with identity (for all of the reasons described above), so you need a function to test for them.

numpy NaN not always recognized

This puzzles me:

```
a=np.array([1,2,np.nan,3])    # an array with a nan
print(np.isnan(a)[2])         # it truly is a nan
print(a[2])                   # it quacks like a nan
print(np.nan is np.nan)       # nan's can be compared
print(a[2] is np.nan)         # But then, this isn't a nan after all!!??

>>> True
>>> nan
>>> True
>>> False
```

I know we're not allowed to compare nan's with ==, but with is should be allowed? After all it works when comparing nan to itself?

Thanks for any hints to what is going on here.

Is NaN not working Python?

The math. isnan() method checks whether a value is NaN (Not a Number), or not. This method returns True if the specified value is a NaN, otherwise it returns False.

Why does NP NaN == NP NaN return false?

The statements give False because math. nan , np. nan and float('nan') all have different ids. They do not have the same identity.

How do I ignore NaN values in NumPy?

Using the isnan() function, we can create a boolean array that has False for all the non nan values and True for all the nan values. Next, using the logical_not() function, We can convert True to False and vice versa. Lastly, using boolean indexing, We can filter all the non nan values from the original NumPy array.

Why does NaN float?

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis.

This isn't so much a question about the Python is operator, as about what indexing, or unboxing, an element of an array does:

In [363]: a=np.array([1,2,np.nan,3])
In [364]: a[2]
Out[364]: nan
In [365]: type(a[2])
Out[365]: numpy.float64
In [366]: a[2] is a[2]
Out[366]: False

a[2] doesn't simply return nan. It returns a np.float64 object whose values is np.nan. Another a[2] will produce another np.float64 object. Two such objects don't match in the is sense. That's true for any array element, not just nan values.

Since == doesn't work for nan, we are stuck with using the np.isnan function.

np.nan is a unique float object (in this session), but a[2] is not set to that object.

If the array was defined as an object type:

In [376]: b=np.array([1,2,np.nan,3], object)
In [377]: b[2] is np.nan
Out[377]: True

here the is is True - because b contains pointers to objects that already exist in memory, including the np.nan object. Same would be true for a list constructed like that.

First, at least in NumPy 1.15, np.nan happens to be a special singleton, meaning that whenever NumPy has to give you a NaN value of type float, it tries to give you the same np.nan value.

But this is not documented anywhere, or guaranteed to be true across versions.

This fits into the larger class of values that may or may not be singletons, as an implementation detail.

As a general rule, if your code relies on two equal values of an immutable type being identical or not being identical, your code is wrong.

Here are some examples from a default build of CPython 3.7:

>>> a, b = 200, 201
>>> a is b-1
True
>>> a, b = 300, 301
>>> a is b-1
False
>>> 301-1 is 300
True
>>> math.nan is math.nan
True
>>> float('nan') is math.nan
False
>>> float('nan') is float('nan')
False

You can learn all of the rules that make all of these things come out that way, but they could all change in a different Python implementation, or in version 3.8, or even in 3.7 built with custom configure options. So, just never 1 or math.nan or np.nan or '' with is; only use it for objects that are specifically documented to be singletons (like None—or instances of your own types, of course).

Second, when you index a numpy array, it has to "unbox" the value by constructing a scalar, of a type appropriate to the array's dtype. For a dtype=float64 array, the scalar value it constructs is a np.float64.

So, a[2] is guaranteed to be a np.float64.

But np.nan is not an np.float64, it's a float.

So, there's no way NumPy can give you np.nan when you ask for a[2]. Instead, it gives you an np.float64 with a NaN value.

OK, so that's why a[2] is np.nan is always False. But why is a[2] is a[2] also usually false?

As I mentioned above, NumPy tries to give you np.nan whenever it needs to give you a float NaN. But—at least in 1.15—it doesn't have any special singleton value to provide whenever it needs to give you a np.float64 NaN. There's no reason it couldn't, but nobody bothered to write such code, because it shouldn't matter either way to any properly-written app.

So, each time you unbox the value in a[2] into a scalar np.float64, it gives you a new NaN-valued np.float64.

But why isn't this the same as 301-1 is 300? Well, the reason that works is that the compiler is allowed to fold constants of known immutable type with equal values, and CPython does exactly that, for simple cases, within each compilation unit. But two NaN values aren't equal; a NaN value isn't even equal to itself. So, it can't be constant-folded.

(If you're wondering what happens if you create an array with an int dtype and store small values in it and check whether they get merged into the small-int singletons—try it and see.)

And of course this is why isnan exists in the first place. You can't test for NaN with equality (because NaN values are not equal to anything, even themselves), you can't test for NaN with identity (for all of the reasons described above), so you need a function to test for them.

numpy NaN not always recognized

Tags:

python

nan

numpy

Bastiaan

People also ask

2 Answers

hpaulj

abarnert

Recent Activity

Donate For Us

numpy NaN not always recognized

Tags:

python

nan

numpy

Bastiaan

People also ask

2 Answers

hpaulj

abarnert

Related questions

Recent Activity

Donate For Us