I have two symmetric (item co-occurrence) matrices A and B and want to find out if they describe the same co-occurrence, only with the row/column labels permuted. (The same permutation has to be applied to rows and columns to keep the symmetry/co-occurrence property)
For example these two matrices should be equal in my test:
a = np.array([
#1 #2 #3 #4 #5 #6 #7
[0, 1, 1, 0, 0, 0, 1], #1
[1, 0, 1, 2, 1, 1, 2], #2
[1, 1, 0, 0, 0, 0, 1], #3
[0, 2, 0, 0, 4, 0, 4], #4
[0, 1, 0, 4, 0, 1, 2], #5
[0, 1, 0, 0, 1, 0, 0], #6
[1, 2, 1, 4, 2, 0, 0] #7
])
b = np.array([
#5 #7 #1,3#3,1#2 #4 #6
[0, 2, 0, 0, 1, 4, 1], #5
[2, 0, 1, 1, 2, 4, 0], #7
[0, 1, 0, 1, 1, 0, 0], #1,3 could be either
[0, 1, 1, 0, 1, 0, 0], #1,3 could be either
[1, 2, 1, 1, 0, 2, 1], #2
[4, 4, 0, 0, 2, 0, 0], #4
[1, 0, 0, 0, 1, 0, 0] #6
])
I currently test if the eigenvalues are the same using numpy.linalg.eigvals
(I am not even sure this is a sufficient condition), but I would like to find a test which doesn't involve numerical accuracy, since I am dealing with integers here.
Here's a vectorized solution based on sorting
and leveraging searchsorted
-
import pandas as pd
# Sort rows for a and b
aS = np.sort(a,axis=1)
bS = np.sort(b,axis=1)
# Scale down each row to a scalar each
scale = np.r_[(np.maximum(aS.max(0),bS.max(0))+1)[::-1].cumprod()[::-1][1:],1]
aS1D = aS.dot(scale)
bS1D = bS.dot(scale)
# Use searchsorted to get the correspondence on indexing
sidx = aS1D.argsort()
searchsorted_idx = np.searchsorted(aS1D,bS1D,sorter=sidx)
searchsorted_idx[searchsorted_idx==len(aS1D)] = len(aS1D)-1
df = pd.DataFrame({'A':searchsorted_idx})
new_order = sidx[df.groupby('A').cumcount().values+searchsorted_idx]
# new_order is the permuted order, i.e. [5, 7, 1, 3, 2, 4, 6]
# Finally index into a with the new_order and compare against b
out = np.array_equal(a[new_order[:,None], new_order],b)
I will assume that you have the list of permutation of the rows/columns of a
which gives the b
, e.g. something like this
p = np.array([5, 7, 1, 3, 2, 4, 6]) - 1
Then you can simply do the following on a
a_p = a[p]
a_p = a_p[:, p]
and check if b
and the permuted a_p
are equal:
(a_p == b).all()
Edit: since you don't have a list like the above p
, you could (at least for small arrays a
and b
) generate the permutations of the indices and check for each one:
from itertools import permutations
def a_p(a, b, p):
p = np.array(p)
a_p = a[p]
a_p = a_p[:, p]
return a_p
for p in permutations(range(a.shape[0])):
if (a_p(a, b, p) == b).all():
print('True')
break
else:
print('False')
Note that this brute-force method works for non-symmetric matrices as well. But since the number of permutations is huge for large arrays a
and b
, this method can be very slow. So your solution with computing eigenvalues is much better.
Here's a benchmark:
def Yduqoli(a, b):
''' I suppose your solution is similar'''
if (np.array(np.unique(a, return_counts=True)) == np.array(np.unique(b, return_counts=True))).all():
a_eigs = np.sort(np.linalg.eigvals(a))
b_eigs = np.sort(np.linalg.eigvals(b))
return np.allclose(a_eigs, b_eigs)
else:
return False
def AndyK(a, b):
for p in permutations(range(a.shape[0])):
if (a_p(a, b, p) == b).all():
return True
return False
%timeit AndyK(a,b)
103 ms ± 4.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit Yduqoli(a,b)
408 µs ± 65.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
where I used the symmetric matrices a
and b
provided by the OP.
Update: As mentioned by Paul Panzer, simply checking the eigenvalues can give incorrect result in some cases, e.g. a = np.array([[4, 0], [0, 0]])
, b = np.array([[2, 2], [2, 2]])
have the same eigenvalues, but cannot be shuffled one into the other. So we first need to check if the arrays a
and b
have the same elements (regardless their positions).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With