Consider the following simple test:
import numpy as np
from timeit import timeit
a = np.random.randint(0,2,1000000,bool)
Let us find the index of the first True
timeit(lambda:a.argmax(), number=1000)
# 0.000451055821031332
This is reasonably fast because numpy
short-circuits.
It also works on contiguous slices,
timeit(lambda:a[1:-1].argmax(), number=1000)
# 0.0006490410305559635
But not, it seems, on non-contiguous ones. I was mainly interested in finding the last True
:
timeit(lambda:a[::-1].argmax(), number=1000)
# 0.3737605109345168
UPDATE: My assumption that the observed slowdown was due to not short circuiting is inaccurate (thanks @Victor Ruiz). Indeed, in the worst-case scenario of an all
False
array
b=np.zeros_like(a)
timeit(lambda:b.argmax(), number=1000)
# 0.04321779008023441
we are still an order of magnitude faster than in the non-contiguous case. I'm ready to accept Victor's explanation that the actual culprit is a copy being made (timings of forcing a copy with
.copy()
are suggestive). Afterwards it doesn't really matter anymore whether short-circuiting happens or not.
But other step sizes != 1 yield similar behavior.
timeit(lambda:a[::2].argmax(), number=1000)
# 0.19192566303536296
Question: Why does numpy
not short-circuit UPDATE without making a copy in the last two examples?
And, more importantly: Is there a workaround, i.e. some way to force numpy
to short-ciruit UPDATE without making a copy also on non-contiguous arrays?
ascontiguousarray() function is used to return a contiguous array where the dimension of the array is greater or equal to 1 and stored in memory (C order). Note: A contiguous array is stored in an unbroken block of memory. To access the subsequent value in the array, we move to the next memory address.
1. Contiguous memory allocation allocates consecutive blocks of memory to a file/process. Non-Contiguous memory allocation allocates separate blocks of memory to a file/process. 2.
Python has a built-in module named 'array' which is similar to arrays in C or C++. In this container, the data is stored in a contiguous block of memory. Just like arrays in C or C++, these arrays only support one data type at a time, therefore it's not heterogenous like Python lists. The indexing is similar to lists.
An array is a contiguous collection of homogeneous elements that can be accessed using an index.
I got interested in solving this problem. So I`ve come with the next solution that manages to avoid the "a[::-1]
" problem case due to internal ndarray copies by np.argmax
:
I created a small library that implements a function argmax
which is a wrapper of np.argmax
, but it has increased performance when the input argument is a 1D boolean array with stride value set to -1:
https://github.com/Vykstorm/numpy-bool-argmax-ext
For those cases, it uses a low-level C routine to find the index k
of an item with maximum value (True
), starting from the end to the beginning of the array a
.
Then you can compute argmax(a[::-1])
with len(a)-k-1
The low-level method doesn't perform any internal ndarray copies because it operates with the array a
which is already C-contiguous and aligned in memory. It also applies short-circuit
EDIT:
I extended the library to improve the performance argmax
also when dealing with stride values different than -1 (with 1D boolean arrays) with good results: a[::2]
, a[::-3]
, e.t.c.
Give it a try.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With