Consider the following simple test: <pre class="prettyprint"><code>import numpy as np from timeit import timeit a = np.random.randint(0,2,1000000,bool) </code></pre> Let us find the index of the first <code>True</code> <pre class="prettyprint"><code>timeit(lambda:a.argmax(), number=1000) # 0.000451055821031332 </code></pre> This is reasonably fast because <code>numpy</code> short-circuits. It also works on contiguous slices, <pre class="prettyprint"><code>timeit(lambda:a[1:-1].argmax(), number=1000) # 0.0006490410305559635 </code></pre> But not, it seems, on non-contiguous ones. I was mainly interested in finding the last <code>True</code>: <pre class="prettyprint"><code>timeit(lambda:a[::-1].argmax(), number=1000) # 0.3737605109345168 </code></pre> <blockquote> UPDATE: My assumption that the observed slowdown was due to not short circuiting is inaccurate (thanks @Victor Ruiz). Indeed, in the worst-case scenario of an all <code>False</code> array </blockquote> <pre class="prettyprint"><code>b=np.zeros_like(a) timeit(lambda:b.argmax(), number=1000) # 0.04321779008023441 </code></pre> <blockquote> we are still an order of magnitude faster than in the non-contiguous case. I'm ready to accept Victor's explanation that the actual culprit is a copy being made (timings of forcing a copy with <code>.copy()</code> are suggestive). Afterwards it doesn't really matter anymore whether short-circuiting happens or not. </blockquote> But other step sizes != 1 yield similar behavior. <pre class="prettyprint"><code>timeit(lambda:a[::2].argmax(), number=1000) # 0.19192566303536296 </code></pre> Question: Why does <code>numpy</code> not short-circuit UPDATE without making a copy in the last two examples? And, more importantly: Is there a workaround, i.e. some way to force <code>numpy</code> to short-ciruit UPDATE without making a copy also on non-contiguous arrays?

I got interested in solving this problem. So I`ve come with the next solution that manages to avoid the "<code>a[::-1]</code>" problem case due to internal ndarray copies by <code>np.argmax</code>: I created a small library that implements a function <code>argmax</code> which is a wrapper of <code>np.argmax</code>, but it has increased performance when the input argument is a 1D boolean array with stride value set to -1: https://github.com/Vykstorm/numpy-bool-argmax-ext For those cases, it uses a low-level C routine to find the index <code>k</code> of an item with maximum value (<code>True</code>), starting from the end to the beginning of the array <code>a</code>. Then you can compute <code>argmax(a[::-1])</code> with <code>len(a)-k-1</code> The low-level method doesn't perform any internal ndarray copies because it operates with the array <code>a</code> which is already C-contiguous and aligned in memory. It also applies short-circuit <hr> EDIT: I extended the library to improve the performance <code>argmax</code> also when dealing with stride values different than -1 (with 1D boolean arrays) with good results: <code>a[::2]</code>, <code>a[::-3]</code>, e.t.c. Give it a try.

Why does numpy not short-circuit on non-contiguous arrays?

Tags:

python

numpy

short-circuiting

Consider the following simple test:

import numpy as np
from timeit import timeit

a = np.random.randint(0,2,1000000,bool)

Let us find the index of the first True

timeit(lambda:a.argmax(), number=1000)
# 0.000451055821031332

This is reasonably fast because numpy short-circuits.

It also works on contiguous slices,

timeit(lambda:a[1:-1].argmax(), number=1000)
# 0.0006490410305559635

But not, it seems, on non-contiguous ones. I was mainly interested in finding the last True:

timeit(lambda:a[::-1].argmax(), number=1000)
# 0.3737605109345168

UPDATE: My assumption that the observed slowdown was due to not short circuiting is inaccurate (thanks @Victor Ruiz). Indeed, in the worst-case scenario of an all False array

b=np.zeros_like(a)
timeit(lambda:b.argmax(), number=1000)
# 0.04321779008023441

we are still an order of magnitude faster than in the non-contiguous case. I'm ready to accept Victor's explanation that the actual culprit is a copy being made (timings of forcing a copy with .copy() are suggestive). Afterwards it doesn't really matter anymore whether short-circuiting happens or not.

But other step sizes != 1 yield similar behavior.

timeit(lambda:a[::2].argmax(), number=1000)
# 0.19192566303536296

Question: Why does numpy not short-circuit UPDATE without making a copy in the last two examples?

And, more importantly: Is there a workaround, i.e. some way to force numpy to short-ciruit UPDATE without making a copy also on non-contiguous arrays?

236

asked Aug 04 '19 11:08

Paul Panzer

1 Answers

I got interested in solving this problem. So I`ve come with the next solution that manages to avoid the "a[::-1]" problem case due to internal ndarray copies by np.argmax:

I created a small library that implements a function argmax which is a wrapper of np.argmax, but it has increased performance when the input argument is a 1D boolean array with stride value set to -1:

https://github.com/Vykstorm/numpy-bool-argmax-ext

For those cases, it uses a low-level C routine to find the index k of an item with maximum value (True), starting from the end to the beginning of the array a.
Then you can compute argmax(a[::-1]) with len(a)-k-1

The low-level method doesn't perform any internal ndarray copies because it operates with the array a which is already C-contiguous and aligned in memory. It also applies short-circuit

EDIT: I extended the library to improve the performance argmax also when dealing with stride values different than -1 (with 1D boolean arrays) with good results: a[::2], a[::-3], e.t.c.

Give it a try.

answered Oct 12 '22 17:10

Victor Ruiz

Related questions
                            
                                django + virtualenv + gunicorn - No module named django.core.wsgi?
                            
                                Seaborn regplot using datetime64 as the x axis
                            
                                Alternative to possessive quantifier in python
                            
                                Loading special characters with PyYaml
                            
                                How to run Flask app as a package in PyCharm
                            
                                Directly load spacy model from packaged tar.gz file
                            
                                Reshaping Keras layers
                            
                                How to type generator function in Cython?
                            
                                How can I resolve pydocstyle error "D205: 1 blank line required between summary line and description (found 0)"?
                            
                                Using ordered dictionary as ordered set
                            
                                Pandas to_csv(sys.stdout) doesn't work under my environment
                            
                                pandas - convert index type from RangeIndex to Int64Index
                            
                                How to add common options to sub commands which can go *after* the name of the sub command
                            
                                What is the meaning of exclamation and question marks in Jupyter notebook?
                            
                                Is there a way to totally ignore all of the MyPy errors in specific project packages?
                            
                                How to enable built-in VPN in OperaDriver?
                            
                                python asyncio gets deadlock if multiple stdin input is needed
                            
                                No web browser found: could not locate runnable browser. Jupyter-notebook
                            
                                module 'tensorflow.python.keras.api._v2.keras.layers' has no attribute 'CuDNNLSTM'
                            
                                Efficient way to loop over 2D array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With