Suppose I have a bunch of arrays, including x
and y
, and I want to check if they're equal. Generally, I can just use np.all(x == y)
(barring some dumb corner cases which I'm ignoring now).
However this evaluates the entire array of (x == y)
, which is usually not needed. My arrays are really large, and I have a lot of them, and the probability of two arrays being equal is small, so in all likelihood, I really only need to evaluate a very small portion of (x == y)
before the all
function could return False, so this is not an optimal solution for me.
I've tried using the builtin all
function, in combination with itertools.izip
: all(val1==val2 for val1,val2 in itertools.izip(x, y))
However, that just seems much slower in the case that two arrays are equal, that overall, it's stil not worth using over np.all
. I presume because of the builtin all
's general-purposeness. And np.all
doesn't work on generators.
Is there a way to do what I want in a more speedy manner?
I know this question is similar to previously asked questions (e.g. Comparing two numpy arrays for equality, element-wise) but they specifically don't cover the case of early termination.
To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.
The Arrays. equals() method checks the equality of the two arrays in terms of size, data, and order of elements. This method will accept the two arrays which need to be compared, and it returns the boolean result true if both the arrays are equal and false if the arrays are not equal.
Compare Two Arrays in Python Using the numpy. array_equiv() Method. The numpy. array_equiv(a1, a2) method takes array a1 and a2 as input and returns True if both arrays' shape and elements are the same; otherwise, returns False .
Comparing two NumPy arrays determines whether they are equivalent by checking if every element at each corresponding index is the same. Method 1: We generally use the == operator to compare two NumPy arrays to generate a new array object.
Until this is implemented in numpy natively you can write your own function and jit-compile it with numba:
import numpy as np
import numba as nb
@nb.jit(nopython=True)
def arrays_equal(a, b):
if a.shape != b.shape:
return False
for ai, bi in zip(a.flat, b.flat):
if ai != bi:
return False
return True
a = np.random.rand(10, 20, 30)
b = np.random.rand(10, 20, 30)
%timeit np.all(a==b) # 100000 loops, best of 3: 9.82 µs per loop
%timeit arrays_equal(a, a) # 100000 loops, best of 3: 9.89 µs per loop
%timeit arrays_equal(a, b) # 100000 loops, best of 3: 691 ns per loop
Worst case performance (arrays equal) is equivalent to np.all
and in case of early stopping the compiled function has the potential to outperform np.all
a lot.
Adding short-circuit logic to array comparisons is apparently being discussed on the numpy page on github, and will thus presumably be available in a future version of numpy.
Probably someone who understands the underlying data structure could optimize this or explain whether it's reliable/safe/good practice, but it seems to work.
np.all(a==b)
Out[]: True
memoryview(a.data)==memoryview(b.data)
Out[]: True
%timeit np.all(a==b)
The slowest run took 10.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.2 µs per loop
%timeit memoryview(a.data)==memoryview(b.data)
The slowest run took 8.55 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop
If I understand this correctly, ndarray.data
creates a pointer to the data buffer and memoryview
creates a native python type that can be short-circuited out of the buffer.
I think.
EDIT: further testing shows it may not be as big a time-improvement as shown. previously a=b=np.eye(5)
a=np.random.randint(0,10,(100,100))
b=a.copy()
%timeit np.all(a==b)
The slowest run took 6.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 17.7 µs per loop
%timeit memoryview(a.data)==memoryview(b.data)
10000 loops, best of 3: 30.1 µs per loop
np.all(a==b)
Out[]: True
memoryview(a.data)==memoryview(b.data)
Out[]: True
Hmmm, I know it is the poor answer but it seems there is no easy way for this. Numpy Creators should fix it. I suggest:
def compare(a, b):
if len(a) > 0 and not np.array_equal(a[0], b[0]):
return False
if len(a) > 15 and not np.array_equal(a[:15], b[:15]):
return False
if len(a) > 200 and not np.array_equal(a[:200], b[:200]):
return False
return np.array_equal(a, b)
:)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With