Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why "numpy.any" has no short-circuit mechanism?

I don't understand why a so basic optimization has not yet be done:

In [1]: one_million_ones = np.ones(10**6)
In [2]: %timeit one_million_ones.any()
100 loops, best of 3: 693µs per loop

In [3]: ten_millions_ones = np.ones(10**7)
In [4]: %timeit ten_millions_ones.any()
10 loops, best of 3: 7.03 ms per loop

The whole array is scanned, even if the conclusion is an evidence at first item.

like image 896
B. M. Avatar asked Aug 19 '17 12:08

B. M.


People also ask

Does Python any short circuit?

In python, short-circuiting is supported by various boolean operators and functions.

Why is NumPy so powerful?

NumPy contains a multi-dimensional array and matrix data structures. It can be utilised to perform a number of mathematical operations on arrays such as trigonometric, statistical, and algebraic routines. Therefore, the library contains a large number of mathematical, algebraic, and transformation functions.

What is NumPy explain in short?

NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely. NumPy stands for Numerical Python.


1 Answers

It's an unfixed performance regression. NumPy issue 3446. There actually is short-circuiting logic, but a change to the ufunc.reduce machinery introduced an unnecessary chunk-based outer loop around the short-circuiting logic, and that outer loop doesn't know how to short circuit. You can see some explanation of the chunking machinery here.

The short-circuiting effects wouldn't have showed up in your test even without the regression, though. First, you're timing the array creation, and second, I don't think they ever put in the short-circuit logic for any input dtype but boolean. From the discussion, it sounds like the details of the ufunc reduction machinery behind numpy.any would have made that difficult.

The discussion does bring up the surprising point that the argmin and argmax methods appear to short-circuit for boolean input. A quick test shows that as of NumPy 1.12 (not quite the most recent version, but the version currently on Ideone), x[x.argmax()] short-circuits, and it outcompetes x.any() and x.max() for 1-dimensional boolean input no matter whether the input is small or large and no matter whether the short-circuiting pays off. Weird!

like image 132
user2357112 supports Monica Avatar answered Sep 17 '22 00:09

user2357112 supports Monica