Suppose that you have two NumPy arrays, a
and b
, and you want to test whether any value of a
is greater than the corresponding value of b
.
Now you could calculate a boolean array and call its any
method:
(a > b).any()
This will do all the looping internally, which is good, but it suffers from the need to perform the comparison on all the pairs even if, say, the very first result evaluates as True
.
Alternatively, you could do an explicit loop over scalar comparisons. An example implementation in the case where a
and b
are the same shape (so broadcasting is not required) might look like:
any(ai > bi for ai, bi in zip(a.flatten(), b.flatten()))
This will benefit from the ability to stop processing after the first True
result is encountered, but with all the costs associated with an explicit loop in Python (albeit inside a comprehension).
Is there any way, either in NumPy itself or in an external library, that you could pass in a description of the operation that you wish to perform, rather than the result of that operation, and then have it perform the operation internally (in optimised low-level code) inside an "any" loop that can be broken out from?
One could imagine hypothetically some kind of interface like:
from array_operations import GreaterThan, Any
expression1 = GreaterThan('x', 'y')
expression2 = Any(expression1)
print(expression2.evaluate(x=a, y=b))
If such a thing exists, clearly it could have other uses beyond efficient evaluation of all
and any
, in terms of being able to create functions dynamically.
Is there anything like this?
Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.
Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.
NumPy doesn't do this, so the challenge is to present the same interface as NumPy without explicitly using lazy evaluation.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
One way to solve this is with delayed/deferred/lazy evaluation. The C++ community uses something called "expression templates" to achieve this; you can find an accessible overview here: http://courses.csail.mit.edu/18.337/2015/projects/TylerOlsen/18337_tjolsen_ExpressionTemplates.pdf
In Python the easiest way to do this is using Numba. You basically just write the function you need in Python using for
loops, then you decorate it with @numba.njit
and it's done. Like this:
@numba.njit
def any_greater(a, b):
for ai, bi in zip(a.flatten(), b.flatten()):
if ai > bi:
return True
return False
There is/was a NumPy enhancement proposal that could help your use case, but I don't think it has been implemented: https://docs.scipy.org/doc/numpy-1.13.0/neps/deferred-ufunc-evaluation.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With