Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy fast check for complete array equality, like Matlabs isequal

In Matlab, the builtin isequal does a check if two arrays are equal. If they are not equal, this might be very fast, as the implementation presumably stops checking as soon as there is a difference:

>> A = zeros(1e9, 1, 'single');    
>> B = A(:);                     
>> B(1) = 1;
>> tic; isequal(A, B); toc;
Elapsed time is 0.000043 seconds.

Is there any equavalent in Python/numpy? all(A==B) or all(equal(A, B)) is far slower, because it compares all elements, even if the initial one differs:

In [13]: A = zeros(1e9, dtype='float32')                                                                                                                                                           

In [14]: B = A.copy()

In [15]: B[0] = 1

In [16]: %timeit all(A==B)
1 loops, best of 3: 612 ms per loop

Is there any numpy equivalent? It should be very easy to implement in C, but slow to implement in Python because this is a case where we do not want to broadcast, so it would require an explicit loop.

Edit:

It appears array_equal does what I want. However, it is not faster than all(A==B), because it's not a built-in, but just a short Python function doing A==B. So it does not meet my need for a fast check.

In [12]: %timeit array_equal(A, B)
1 loops, best of 3: 623 ms per loop
like image 434
gerrit Avatar asked Oct 08 '14 15:10

gerrit


1 Answers

First, it should be noted that in the OP's example the arrays have identical elements because B=A[:] is just a view onto the array, so:

>>>  print A[0], B[0]
1.0, 1.0

But, although the test isn't a fit one, the basic complaint is true: Numpy does not have a short-circuiting equivalency check.

One can easily see from the source that all of allclose, array_equal, and array_equiv are just variations upon all(A==B) to match their respective details, and are not notable faster.

An advantage of numpy though is that slices are just views, and are therefore very fast, so one could write their own short-circuiting comparison fairly easily (I'm not saying this is ideal, but it does work):

from numpy import *

A = zeros(1e8, dtype='float32')                                                                                                                                                         
B = A[:]
B[0] = 1
C = array(B)
C[0] = 2
D = array(A)
D[-1] = 2

def short_circuit_check(a, b, n):
    L = len(a)/n
    for i in range(n):
        j = i*L
        if not all(a[j:j+L]==b[j:j+L]):
                return False
    return True


In [26]: %timeit short_circuit_check(A, C, 100)   # 100x faster
1000 loops, best of 3: 1.49 ms per loop

In [27]: %timeit all(A==C)
1 loops, best of 3: 158 ms per loop

In [28]: %timeit short_circuit_check(A, D, 100)
10 loops, best of 3: 144 ms per loop

In [29]: %timeit all(A==D)
10 loops, best of 3: 160 ms per loop
like image 90
tom10 Avatar answered Nov 11 '22 19:11

tom10