Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding identical rows and columns in a numpy array

I have a bolean array of nxn elements and I want to check if any row is identical to another.If there are any identical rows, I want to check if the corresponding columns are also identical.

Here is an example:

A=np.array([[0, 1, 0, 0, 0, 1],
            [0, 0, 0, 1, 0, 1],
            [0, 1, 0, 0, 0, 1],
            [1, 0, 1, 0, 1, 1],
            [1, 1, 1, 0, 0, 0],
            [0, 1, 0, 1, 0, 1]])

I would like the program to find that the first and the third row are identical, and then check if the first and the third columns are also identical; which in this case they are.

like image 386
cgog Avatar asked Feb 13 '23 03:02

cgog


2 Answers

You can use np.array_equal():

for i in range(len(A)):  # generate pairs
    for j in range(i + 1, len(A)): 
        if np.array_equal(A[i], A[j]):  # compare rows
            if np.array_equal(A[:,i], A[:,j]):  # compare columns
                print(i, j)
        else:
            pass

or using combinations():

import itertools

for pair in itertools.combinations(range(len(A)), 2):
    if np.array_equal(A[pair[0]], A[pair[1]]) and np.array_equal(A[:,pair[0]], A[:,pair[1]]):  # compare columns
        print(pair)
like image 131
Esther Martinez Avatar answered Feb 15 '23 10:02

Esther Martinez


Starting with the typical way to apply np.unique to 2D arrays and have it return unique pairs:

def unique_pairs(arr):
    uview = np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1])))
    uvals, uidx = np.unique(uview, return_inverse=True)
    pos = np.where(np.bincount(uidx) == 2)[0]

    pairs = []
    for p in pos:
        pairs.append(np.where(uidx==p)[0])

    return np.array(pairs)

We can then do the following:

row_pairs = unique_pairs(A)
col_pairs = unique_pairs(A.T)

for pair in row_pairs:
    if np.any(np.all(pair==col_pairs, axis=1)):
        print pair

>>> [0 2]

Of course there is quite a few optimizations left to do, but the main point is using np.unique. The efficiency on this method compared to others depends heavily on how you define "small" arrays.

like image 27
Daniel Avatar answered Feb 15 '23 09:02

Daniel