I have n matrices of the same size and want to see how many cells are equal to each other across all matrices. Code:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[5,6,7], [4,2,6], [7, 8, 9]])
c = np.array([2,3,4],[4,5,6],[1,2,5])
#Intuition is below but is wrong
a == b == c
How do I get Python to return a value of 2 (cells 2,1 and 2,3 match in all 3 matrices) or an array of [[False, False, False], [True, False, True], [False, False, False]]?
To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.
To perform element-wise comparison of two string arrays using a comparison operator, use the numpy. compare_chararrays() method in Python Numpy. The arr1 and arr2 are the two input string arrays of the same shape to be compared.
Method 1: We generally use the == operator to compare two NumPy arrays to generate a new array object. Call ndarray. all() with the new array object as ndarray to return True if the two NumPy arrays are equivalent.
The == operator when used with the arrays, returns the array with the shape equivalent to both arrays, the returned array contains True at an index if the elements of both arrays are equal in that index, and the array will otherwise contain False at that index.
You can do:
(a == b) & (b==c)
[[False False False]
[ True False True]
[False False False]]
For n
items in, say, a list like x=[a, b, c, a, b, c]
, one could do:
r = x[0] == x[1]
for temp in x[2:]:
r &= x[0]==temp
The result in now in r
.
If the structure is already in a 3D numpy array, one could also use:
np.amax(x,axis=2)==np.amin(x,axis=2)
The idea for the above line is that although it would be ideal to have an equal
function with an axis
argument, there isn't one so this line notes that if amin==amax
along the axis, then all elements are equal.
If the different arrays to be compared aren't already in a 3D numpy array (or won't be in the future), looping the list is a fast and easy approach. Although I generally agree with avoiding Python loops for Numpy arrays, this seems like a case where it's easier and faster (see below) to use a Python loop since the loop is only along a single axis and it's easy to accumulate the comparisons in place. Here's a timing test:
def f0(x):
r = x[0] == x[1]
for y in x[2:]:
r &= x[0]==y
def f1(x): # from @Divakar
r = ~np.any(np.diff(np.dstack(x),axis=2),axis=2)
def f2(x):
x = np.dstack(x)
r = np.amax(x,axis=2)==np.amin(x,axis=2)
# speed test
for n, size, reps in ((1000, 3, 1000), (10, 1000, 100)):
x = [np.ones((size, size)) for i in range(n)]
print n, size, reps
print "f0: ",
print timeit("f0(x)", "from __main__ import x, f0, f1", number=reps)
print "f1: ",
print timeit("f1(x)", "from __main__ import x, f0, f1", number=reps)
print
1000 3 1000
f0: 1.14673900604 # loop
f1: 3.93413209915 # diff
f2: 3.93126702309 # min max
10 1000 100
f0: 2.42633581161 # loop
f1: 27.1066679955 # diff
f2: 25.9518558979 # min max
If arrays are already in a single 3D numpy array (eg, from using x = np.dstack(x)
in the above) then modifying the above function defs appropriately and with the addition of the min==max
approach gives:
def g0(x):
r = x[:,:,0] == x[:,:,1]
for iy in range(x[:,:,2:].shape[2]):
r &= x[:,:,0]==x[:,:,iy]
def g1(x): # from @Divakar
r = ~np.any(np.diff(x,axis=2),axis=2)
def g2(x):
r = np.amax(x,axis=2)==np.amin(x,axis=2)
which yields:
1000 3 1000
g0: 3.9761030674 # loop
g1: 0.0599548816681 # diff
g2: 0.0313589572906 # min max
10 1000 100
g0: 10.7617051601 # loop
g1: 10.881870985 # diff
g2: 9.66712999344 # min max
Note also that for a list of large arrays f0 = 2.4
and for a pre-built array g0, g1, g2 ~= 10.
, so that if the input arrays are large, than fastest approach by about 4x is to store them separately in a list. I find this a bit surprising and guess that this might be due to cache swapping (or bad code?), but I'm not sure anyone really cares so I'll stop this here.
Concatenate along the third axis with np.dstack
and perfom differentiation with np.diff
, so that the identical ones would show up as zeros. Then, check for cases where all are zeros with ~np.any
. Thus, you would have a one-liner solution like so -
~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Sample run -
In [39]: a
Out[39]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [40]: b
Out[40]:
array([[5, 6, 7],
[4, 2, 6],
[7, 8, 9]])
In [41]: c
Out[41]:
array([[2, 3, 4],
[4, 5, 6],
[1, 2, 5]])
In [42]: ~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Out[42]:
array([[False, False, False],
[ True, False, True],
[False, False, False]], dtype=bool)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With