Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to find non-finite values

Tags:

python

numpy

This is inspired by: python: Combined masking in numpy.

The task is to create a Boolean array of all values that are not finite. For example:

>>> arr = np.array([0, 2, np.inf, -np.inf, np.nan])
>>> ~np.isfinite(arr)
array([False, False,  True,  True,  True], dtype=bool)

To me, it seems this would be the fastest way to find the non-finite values, but it seems that there is a faster way. Specifically np.isnan(arr - arr) should do the same:

>>> np.isnan(arr - arr)
array([False, False,  True,  True,  True], dtype=bool)

Timing it we see that it is twice as fast!

arr = np.random.rand(100000)

%timeit ~np.isfinite(arr)
10000 loops, best of 3: 198 µs per loop

%timeit np.isnan(arr - arr)
10000 loops, best of 3: 85.8 µs per loop

So my question is twofold:

  1. Why is the np.isnan(arr - arr) trick faster than the "obvious" ~np.isfinite(arr) version? Is there input which it does not work for?

  2. Is there an even faster way to find all non-finite values?

like image 687
Jonas Adler Avatar asked Aug 18 '17 12:08

Jonas Adler


1 Answers

That's hard to answer because np.isnan and np.isfinite can use different C functions depending on the build. And depending on the performance (which may well depend on the compiler, the system and how NumPy itself is built) of these C functions the timings will be different.


The ufuncs for both refer to a built-in npy_ func (source (1.11.3)):

/**begin repeat1
 * #kind = isnan, isinf, isfinite, signbit, copysign, nextafter, spacing#
 * #func = npy_isnan, npy_isinf, npy_isfinite, npy_signbit, npy_copysign, nextafter, spacing#
 **/

And these functions are defined based on the presence of compile time constants (source (1.11.3)):

/* use builtins to avoid function calls in tight loops
 * only available if npy_config.h is available (= numpys own build) */
#if HAVE___BUILTIN_ISNAN
    #define npy_isnan(x) __builtin_isnan(x)
#else
    #ifndef NPY_HAVE_DECL_ISNAN
        #define npy_isnan(x) ((x) != (x))
    #else
        #if defined(_MSC_VER) && (_MSC_VER < 1900)
            #define npy_isnan(x) _isnan((x))
        #else
            #define npy_isnan(x) isnan(x)
        #endif
    #endif
#endif

/* only available if npy_config.h is available (= numpys own build) */
#if HAVE___BUILTIN_ISFINITE
    #define npy_isfinite(x) __builtin_isfinite(x)
#else
    #ifndef NPY_HAVE_DECL_ISFINITE
        #ifdef _MSC_VER
            #define npy_isfinite(x) _finite((x))
        #else
            #define npy_isfinite(x) !npy_isnan((x) + (-x))
        #endif
    #else
        #define npy_isfinite(x) isfinite((x))
    #endif
#endif

So it might just be that in your case the np.isfinite has to do (much) more work than np.isnan. But it's equally likely that on another computer or with another build np.isfinite is faster or both are equally fast.

So, there is probably not a hard rule what the "fastest way" is. That just depends on too many factors. Personally I would just go with the np.isfinite because it can be faster (and isn't too much slower even in your case) and it makes the intention much clearer.


Just in case you're really into optimizing the performance, you can always do the negating in-place. That might decrease the time and memory by avoiding one temporary array:

import numpy as np
arr = np.random.rand(1000000)

def isnotfinite(arr):
    res = np.isfinite(arr)
    np.bitwise_not(res, out=res)  # in-place
    return res

np.testing.assert_array_equal(~np.isfinite(arr), isnotfinite(arr))
np.testing.assert_array_equal(~np.isfinite(arr), np.isnan(arr - arr))

%timeit ~np.isfinite(arr)
# 3.73 ms ± 4.16 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit isnotfinite(arr)
# 2.41 ms ± 29.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.isnan(arr - arr)
# 12.5 ms ± 772 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Note also that the np.isnan solution is much slower on my computer (Windows 10 64bit Python 3.5 NumPy 1.13.1 Anaconda build)

like image 54
MSeifert Avatar answered Oct 17 '22 15:10

MSeifert