Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Elementwise comparison of same shaped arrays

Tags:

python

numpy

I have n matrices of the same size and want to see how many cells are equal to each other across all matrices. Code:

import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[5,6,7], [4,2,6], [7, 8, 9]])
c = np.array([2,3,4],[4,5,6],[1,2,5])

#Intuition is below but is wrong
a == b == c

How do I get Python to return a value of 2 (cells 2,1 and 2,3 match in all 3 matrices) or an array of [[False, False, False], [True, False, True], [False, False, False]]?

like image 638
ZacharyST Avatar asked Aug 07 '15 22:08

ZacharyST


People also ask

How do you compare two arrays the same in Python?

To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.

How do you do element-wise comparison in Python?

To perform element-wise comparison of two string arrays using a comparison operator, use the numpy. compare_chararrays() method in Python Numpy. The arr1 and arr2 are the two input string arrays of the same shape to be compared.

How do you compare Ndarrays?

Method 1: We generally use the == operator to compare two NumPy arrays to generate a new array object. Call ndarray. all() with the new array object as ndarray to return True if the two NumPy arrays are equivalent.

Can you compare arrays in Python?

The == operator when used with the arrays, returns the array with the shape equivalent to both arrays, the returned array contains True at an index if the elements of both arrays are equal in that index, and the array will otherwise contain False at that index.


2 Answers

You can do:

(a == b) & (b==c)

[[False False False]
 [ True False  True]
 [False False False]]

For n items in, say, a list like x=[a, b, c, a, b, c], one could do:

r = x[0] == x[1]
for temp in x[2:]:
    r &= x[0]==temp

The result in now in r.

If the structure is already in a 3D numpy array, one could also use:

np.amax(x,axis=2)==np.amin(x,axis=2)

The idea for the above line is that although it would be ideal to have an equal function with an axis argument, there isn't one so this line notes that if amin==amax along the axis, then all elements are equal.


If the different arrays to be compared aren't already in a 3D numpy array (or won't be in the future), looping the list is a fast and easy approach. Although I generally agree with avoiding Python loops for Numpy arrays, this seems like a case where it's easier and faster (see below) to use a Python loop since the loop is only along a single axis and it's easy to accumulate the comparisons in place. Here's a timing test:

def f0(x):
    r = x[0] == x[1]
    for y in x[2:]:
        r &= x[0]==y

def f1(x):  # from @Divakar
    r = ~np.any(np.diff(np.dstack(x),axis=2),axis=2)

def f2(x):
    x = np.dstack(x)
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

# speed test
for n, size, reps in ((1000, 3, 1000), (10, 1000, 100)):
    x = [np.ones((size, size)) for i in range(n)]
    print n, size, reps
    print "f0: ",
    print timeit("f0(x)", "from __main__ import x, f0, f1", number=reps)
    print "f1: ",
    print timeit("f1(x)", "from __main__ import x, f0, f1", number=reps)
    print

1000 3 1000
f0:  1.14673900604  # loop
f1:  3.93413209915  # diff
f2:  3.93126702309  # min max

10 1000 100
f0:  2.42633581161  # loop
f1:  27.1066679955  # diff
f2:  25.9518558979  # min max

If arrays are already in a single 3D numpy array (eg, from using x = np.dstack(x) in the above) then modifying the above function defs appropriately and with the addition of the min==max approach gives:

def g0(x):
    r = x[:,:,0] == x[:,:,1]
    for iy in range(x[:,:,2:].shape[2]):
        r &= x[:,:,0]==x[:,:,iy]

def g1(x):   # from @Divakar
    r = ~np.any(np.diff(x,axis=2),axis=2)

def g2(x):
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

which yields:

1000 3 1000
g0:  3.9761030674      # loop
g1:  0.0599548816681   # diff
g2:  0.0313589572906   # min max

10 1000 100
g0:  10.7617051601     # loop
g1:  10.881870985      # diff
g2:  9.66712999344     # min max

Note also that for a list of large arrays f0 = 2.4 and for a pre-built array g0, g1, g2 ~= 10., so that if the input arrays are large, than fastest approach by about 4x is to store them separately in a list. I find this a bit surprising and guess that this might be due to cache swapping (or bad code?), but I'm not sure anyone really cares so I'll stop this here.

like image 141
tom10 Avatar answered Sep 21 '22 14:09

tom10


Concatenate along the third axis with np.dstack and perfom differentiation with np.diff, so that the identical ones would show up as zeros. Then, check for cases where all are zeros with ~np.any. Thus, you would have a one-liner solution like so -

~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)

Sample run -

In [39]: a
Out[39]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [40]: b
Out[40]: 
array([[5, 6, 7],
       [4, 2, 6],
       [7, 8, 9]])

In [41]: c
Out[41]: 
array([[2, 3, 4],
       [4, 5, 6],
       [1, 2, 5]])

In [42]: ~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Out[42]: 
array([[False, False, False],
       [ True, False,  True],
       [False, False, False]], dtype=bool)
like image 23
Divakar Avatar answered Sep 21 '22 14:09

Divakar