consider the array a
a = np.array([3, 3, np.nan, 3, 3, np.nan])
I could do
np.isnan(a).argmax()
But this requires finding all np.nan
just to find the first.
Is there a more efficient way?
I've been trying to figure out if I can pass a parameter to np.argpartition
such that np.nan
get's sorted first as opposed to last.
EDIT regarding [dup].
There are several reasons this question is different.
isnan
.EDIT regarding second [dup].
Still addressing equality and question/answers are old and very possibly outdated.
It might also be worth to look into numba.jit
; without it, the vectorized version will likely beat a straight-forward pure-Python search in most scenarios, but after compiling the code, the ordinary search will take the lead, at least in my testing:
In [63]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [70]: %paste
import numba
def naive(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
def short(a):
return np.isnan(a).argmax()
@numba.jit
def naive_jit(a):
for i in range(len(a)):
if np.isnan(a[i]):
return i
@numba.jit
def short_jit(a):
return np.isnan(a).argmax()
## -- End pasted text --
In [71]: %timeit naive(a)
100 loops, best of 3: 7.22 ms per loop
In [72]: %timeit short(a)
The slowest run took 4.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 37.7 µs per loop
In [73]: %timeit naive_jit(a)
The slowest run took 6821.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.79 µs per loop
In [74]: %timeit short_jit(a)
The slowest run took 395.51 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop
Edit: As pointed out by @hpaulj in their answer, numpy
actually ships with an optimized short-circuited search whose performance is comparable with the JITted search above:
In [26]: %paste
def plain(a):
return a.argmax()
@numba.jit
def plain_jit(a):
return a.argmax()
## -- End pasted text --
In [35]: %timeit naive(a)
100 loops, best of 3: 7.13 ms per loop
In [36]: %timeit plain(a)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.04 µs per loop
In [37]: %timeit naive_jit(a)
100000 loops, best of 3: 6.91 µs per loop
In [38]: %timeit plain_jit(a)
10000 loops, best of 3: 125 µs per loop
I'll nominate
a.argmax()
With @fuglede's
test array:
In [1]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])
In [2]: np.isnan(a).argmax()
Out[2]: 9999
In [3]: np.argmax(a)
Out[3]: 9999
In [4]: a.argmax()
Out[4]: 9999
In [5]: timeit a.argmax()
The slowest run took 29.94 ....
10000 loops, best of 3: 20.3 µs per loop
In [6]: timeit np.isnan(a).argmax()
The slowest run took 7.82 ...
1000 loops, best of 3: 462 µs per loop
I don't have numba
installed, so can compare that. But my speedup relative to short
is greater than @fuglede's
6x.
I'm testing in Py3, which accepts <np.nan
, while Py2 raises a runtime warning. But the code search suggests this isn't dependent on that comparison.
/numpy/core/src/multiarray/calculation.c
PyArray_ArgMax
plays with axes (moving the one of interest to the end), and delegates the action to arg_func = PyArray_DESCR(ap)->f->argmax
, a function that depends on the dtype.
In numpy/core/src/multiarray/arraytypes.c.src
it looks like BOOL_argmax
short circuits, returning as soon as it encounters a True
.
for (; i < n; i++) {
if (ip[i]) {
*max_ind = i;
return 0;
}
}
And @fname@_argmax
also short circuits on maximal nan
. np.nan
is 'maximal' in argmin
as well.
#if @isfloat@
if (@isnan@(mp)) {
/* nan encountered; it's maximal */
return 0;
}
#endif
Comments from experienced c
coders are welcomed, but it appears to me that at least for np.nan
, a plain argmax
will be as fast you we can get.
Playing with the 9999
in generating a
shows that the a.argmax
time depends on that value, consistent with short circuiting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With