Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find indexes of matching rows in two 2-D arrays

Tags:

python

numpy

Suppose that I have two 2-D arrays as follows:

array([[3, 3, 1, 0],
       [2, 3, 1, 3],
       [0, 2, 3, 1],
       [1, 0, 2, 3],
       [3, 1, 0, 2]], dtype=int8)

array([[0, 3, 3, 1],
       [0, 2, 3, 1],
       [1, 0, 2, 3],
       [3, 1, 0, 2],
       [3, 3, 1, 0]], dtype=int8)

Some rows in each array have a corresponding row that matches by value (but not necessarily by index) in the other array, and some don't.

I would like to find an efficient way to return pairs of indexes in the two arrays that correspond to matching rows. If they were to be tuples I would expect to return

(0,4)
(2,1)
(3,2)
(4,3)
like image 409
llevar Avatar asked Nov 26 '13 23:11

llevar


2 Answers

I can't think of a numpy specific way to do it, but here's what I would do with regular lists:

>>> L1= [[3, 3, 1, 0],
...        [2, 3, 1, 3],
...        [0, 2, 3, 1],
...        [1, 0, 2, 3],
...        [3, 1, 0, 2]]
>>> L2 = [[0, 3, 3, 1],
...        [0, 2, 3, 1],
...        [1, 0, 2, 3],
...        [3, 1, 0, 2],
...        [3, 3, 1, 0]]
>>> L1 = {tuple(row):i for i,row in enumerate(L1)}
>>> answer = []
>>> for i,row in enumerate(L2):
...   if tuple(row) in L1:
...     answer.append((L1[tuple(row)], i))
... 
>>> answer
[(2, 1), (3, 2), (4, 3), (0, 4)]
like image 90
inspectorG4dget Avatar answered Oct 13 '22 02:10

inspectorG4dget


This is an all numpy solution - not that is necessarily better than an iterative Python one. It still has to look at all combinations.

In [53]: np.array(np.all((x[:,None,:]==y[None,:,:]),axis=-1).nonzero()).T.tolist()
Out[53]: [[0, 4], [2, 1], [3, 2], [4, 3]]

The intermediate array is (5,5,4). The np.all reduces it to:

array([[False, False, False, False,  True],
       [False, False, False, False, False],
       [False,  True, False, False, False],
       [False, False,  True, False, False],
       [False, False, False,  True, False]], dtype=bool)

The rest is just extracting the indices where this is True

In crude tests, this times at 47.8 us; the other answer with the L1 dictionary at 38.3 us; and a third with a double loop at 496 us.

like image 27
hpaulj Avatar answered Oct 13 '22 04:10

hpaulj