Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect if a NumPy array contains at least one non-numeric value?

Tags:

python

numpy

People also ask

Is NaN for Numpy array?

Test element-wise for Not a Number (NaN), return result as a bool array. Input array. For scalar input, the result is a new boolean with value True if the input is NaN; otherwise the value is False.

How do you check if a value is in a Numpy array?

Using Numpy array, we can easily find whether specific values are present or not. For this purpose, we use the “in” operator. “in” operator is used to check whether certain element and values are present in a given sequence and hence return Boolean values 'True” and “False“.


This should be faster than iterating and will work regardless of shape.

numpy.isnan(myarray).any()

Edit: 30x faster:

import timeit
s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan'
ms = [
    'numpy.isnan(a).any()',
    'any(numpy.isnan(x) for x in a.flatten())']
for m in ms:
    print "  %.2f s" % timeit.Timer(m, s).timeit(1000), m

Results:

  0.11 s numpy.isnan(a).any()
  3.75 s any(numpy.isnan(x) for x in a.flatten())

Bonus: it works fine for non-array NumPy types:

>>> a = numpy.float64(42.)
>>> numpy.isnan(a).any()
False
>>> a = numpy.float64(numpy.nan)
>>> numpy.isnan(a).any()
True

If infinity is a possible value, I would use numpy.isfinite

numpy.isfinite(myarray).all()

If the above evaluates to True, then myarray contains none of numpy.nan, numpy.inf or -numpy.inf.

numpy.isnan will be OK with numpy.inf values, for example:

In [11]: import numpy as np

In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]])

In [13]: np.isnan(b)
Out[13]: 
array([[False, False],
       [ True, False]], dtype=bool)

In [14]: np.isfinite(b)
Out[14]: 
array([[ True, False],
       [False, False]], dtype=bool)

Pfft! Microseconds! Never solve a problem in microseconds that can be solved in nanoseconds.

Note that the accepted answer:

  • iterates over the whole data, regardless of whether a nan is found
  • creates a temporary array of size N, which is redundant.

A better solution is to return True immediately when NAN is found:

import numba
import numpy as np

NAN = float("nan")

@numba.njit(nogil=True)
def _any_nans(a):
    for x in a:
        if np.isnan(x): return True
    return False

@numba.jit
def any_nans(a):
    if not a.dtype.kind=='f': return False
    return _any_nans(a.flat)

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 573us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 774ns  (!nanoseconds)

and works for n-dimensions:

array1M_nd = array1M.reshape((len(array1M)/2, 2))
assert any_nans(array1M_nd)==True
%timeit any_nans(array1M_nd)  # 774ns

Compare this to the numpy native solution:

def any_nans(a):
    if not a.dtype.kind=='f': return False
    return np.isnan(a).any()

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 456us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 470us

%timeit np.isnan(array1M).any()  # 532us

The early-exit method is 3 orders or magnitude speedup (in some cases). Not too shabby for a simple annotation.


With numpy 1.3 or svn you can do this

In [1]: a = arange(10000.).reshape(100,100)

In [3]: isnan(a.max())
Out[3]: False

In [4]: a[50,50] = nan

In [5]: isnan(a.max())
Out[5]: True

In [6]: timeit isnan(a.max())
10000 loops, best of 3: 66.3 µs per loop

The treatment of nans in comparisons was not consistent in earlier versions.


(np.where(np.isnan(A)))[0].shape[0] will be greater than 0 if A contains at least one element of nan, A could be an n x m matrix.

Example:

import numpy as np

A = np.array([1,2,4,np.nan])

if (np.where(np.isnan(A)))[0].shape[0]: 
    print "A contains nan"
else:
    print "A does not contain nan"