Python: Elementwise comparison of same shaped arrays

Tags:

numpy

I have n matrices of the same size and want to see how many cells are equal to each other across all matrices. Code:

import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[5,6,7], [4,2,6], [7, 8, 9]])
c = np.array([2,3,4],[4,5,6],[1,2,5])

#Intuition is below but is wrong
a == b == c

How do I get Python to return a value of 2 (cells 2,1 and 2,3 match in all 3 matrices) or an array of [[False, False, False], [True, False, True], [False, False, False]]?

638

asked Aug 07 '15 22:08

2 Answers

You can do:

(a == b) & (b==c)

[[False False False]
 [ True False  True]
 [False False False]]

For n items in, say, a list like x=[a, b, c, a, b, c], one could do:

r = x[0] == x[1]
for temp in x[2:]:
    r &= x[0]==temp

The result in now in r.

If the structure is already in a 3D numpy array, one could also use:

np.amax(x,axis=2)==np.amin(x,axis=2)

The idea for the above line is that although it would be ideal to have an equal function with an axis argument, there isn't one so this line notes that if amin==amax along the axis, then all elements are equal.

If the different arrays to be compared aren't already in a 3D numpy array (or won't be in the future), looping the list is a fast and easy approach. Although I generally agree with avoiding Python loops for Numpy arrays, this seems like a case where it's easier and faster (see below) to use a Python loop since the loop is only along a single axis and it's easy to accumulate the comparisons in place. Here's a timing test:

def f0(x):
    r = x[0] == x[1]
    for y in x[2:]:
        r &= x[0]==y

def f1(x):  # from @Divakar
    r = ~np.any(np.diff(np.dstack(x),axis=2),axis=2)

def f2(x):
    x = np.dstack(x)
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

# speed test
for n, size, reps in ((1000, 3, 1000), (10, 1000, 100)):
    x = [np.ones((size, size)) for i in range(n)]
    print n, size, reps
    print "f0: ",
    print timeit("f0(x)", "from __main__ import x, f0, f1", number=reps)
    print "f1: ",
    print timeit("f1(x)", "from __main__ import x, f0, f1", number=reps)
    print

1000 3 1000
f0:  1.14673900604  # loop
f1:  3.93413209915  # diff
f2:  3.93126702309  # min max

10 1000 100
f0:  2.42633581161  # loop
f1:  27.1066679955  # diff
f2:  25.9518558979  # min max

If arrays are already in a single 3D numpy array (eg, from using x = np.dstack(x) in the above) then modifying the above function defs appropriately and with the addition of the min==max approach gives:

def g0(x):
    r = x[:,:,0] == x[:,:,1]
    for iy in range(x[:,:,2:].shape[2]):
        r &= x[:,:,0]==x[:,:,iy]

def g1(x):   # from @Divakar
    r = ~np.any(np.diff(x,axis=2),axis=2)

def g2(x):
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

which yields:

1000 3 1000
g0:  3.9761030674      # loop
g1:  0.0599548816681   # diff
g2:  0.0313589572906   # min max

10 1000 100
g0:  10.7617051601     # loop
g1:  10.881870985      # diff
g2:  9.66712999344     # min max

Note also that for a list of large arrays f0 = 2.4 and for a pre-built array g0, g1, g2 ~= 10., so that if the input arrays are large, than fastest approach by about 4x is to store them separately in a list. I find this a bit surprising and guess that this might be due to cache swapping (or bad code?), but I'm not sure anyone really cares so I'll stop this here.

141

answered Sep 21 '22 14:09

tom10

Concatenate along the third axis with np.dstack and perfom differentiation with np.diff, so that the identical ones would show up as zeros. Then, check for cases where all are zeros with ~np.any. Thus, you would have a one-liner solution like so -

~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)

Sample run -

In [39]: a
Out[39]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [40]: b
Out[40]: 
array([[5, 6, 7],
       [4, 2, 6],
       [7, 8, 9]])

In [41]: c
Out[41]: 
array([[2, 3, 4],
       [4, 5, 6],
       [1, 2, 5]])

In [42]: ~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Out[42]: 
array([[False, False, False],
       [ True, False,  True],
       [False, False, False]], dtype=bool)

answered Sep 21 '22 14:09

Divakar

Related questions
                            
                                import module within loop
                            
                                incorrect answers for quadratic equations
                            
                                How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?
                            
                                Overriding Django-Rest-Framework serializer is_valid method
                            
                                What is the identity of "ndim, shape, size, ..etc" of ndarray in numpy
                            
                                Calculating exp(x) with the use of recursion in Python [duplicate]
                            
                                can't compile openssl because of 'cl' is not recognized
                            
                                Python: setting two variable values separated by a comma in python
                            
                                save dataframe.hist() to a file [duplicate]
                            
                                Artifacts in a filled contour plot on 3D axes
                            
                                Measuring geographic distance with scipy
                            
                                Matplotlib 'key_press_event' does not respond
                            
                                How to make a legend with only text
                            
                                Is there a best way to change given number of days to years months weeks days in Python?
                            
                                Load pickled classifier data : Vocabulary not fitted Error
                            
                                Form data in pycurl request
                            
                                "ValueError: Unrecognized marker style -d" when looping over markers
                            
                                'pyramid-debugtoolbar' distribution was not found and is required
                            
                                Loop through folders in Python and for files containing strings
                            
                                NameError: name 'argv' is not defined

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Elementwise comparison of same shaped arrays

Tags:

python

numpy

ZacharyST

People also ask

2 Answers

tom10

Divakar

Recent Activity

Donate For Us