Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy: efficient way to do "any" or "all" on the result of an operation

Suppose that you have two NumPy arrays, a and b, and you want to test whether any value of a is greater than the corresponding value of b.

Now you could calculate a boolean array and call its any method:

(a > b).any()

This will do all the looping internally, which is good, but it suffers from the need to perform the comparison on all the pairs even if, say, the very first result evaluates as True.

Alternatively, you could do an explicit loop over scalar comparisons. An example implementation in the case where a and b are the same shape (so broadcasting is not required) might look like:

any(ai > bi for ai, bi in zip(a.flatten(), b.flatten()))

This will benefit from the ability to stop processing after the first True result is encountered, but with all the costs associated with an explicit loop in Python (albeit inside a comprehension).

Is there any way, either in NumPy itself or in an external library, that you could pass in a description of the operation that you wish to perform, rather than the result of that operation, and then have it perform the operation internally (in optimised low-level code) inside an "any" loop that can be broken out from?

One could imagine hypothetically some kind of interface like:

from array_operations import GreaterThan, Any

expression1 = GreaterThan('x', 'y')
expression2 = Any(expression1)

print(expression2.evaluate(x=a, y=b))

If such a thing exists, clearly it could have other uses beyond efficient evaluation of all and any, in terms of being able to create functions dynamically.

Is there anything like this?

like image 928
alani Avatar asked Jun 27 '20 10:06

alani


People also ask

Why is NumPy so efficient?

Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.

Is appending to NumPy array efficient?

Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.

Does NumPy use lazy evaluation?

NumPy doesn't do this, so the challenge is to present the same interface as NumPy without explicitly using lazy evaluation.

How NumPy array is faster than list?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.


1 Answers

One way to solve this is with delayed/deferred/lazy evaluation. The C++ community uses something called "expression templates" to achieve this; you can find an accessible overview here: http://courses.csail.mit.edu/18.337/2015/projects/TylerOlsen/18337_tjolsen_ExpressionTemplates.pdf

In Python the easiest way to do this is using Numba. You basically just write the function you need in Python using for loops, then you decorate it with @numba.njit and it's done. Like this:

@numba.njit
def any_greater(a, b):
    for ai, bi in zip(a.flatten(), b.flatten()): 
        if ai > bi: 
            return True 
    return False 

There is/was a NumPy enhancement proposal that could help your use case, but I don't think it has been implemented: https://docs.scipy.org/doc/numpy-1.13.0/neps/deferred-ufunc-evaluation.html

like image 172
John Zwinck Avatar answered Oct 21 '22 17:10

John Zwinck