Suppose that I have two 2-D arrays as follows:
array([[3, 3, 1, 0],
[2, 3, 1, 3],
[0, 2, 3, 1],
[1, 0, 2, 3],
[3, 1, 0, 2]], dtype=int8)
array([[0, 3, 3, 1],
[0, 2, 3, 1],
[1, 0, 2, 3],
[3, 1, 0, 2],
[3, 3, 1, 0]], dtype=int8)
Some rows in each array have a corresponding row that matches by value (but not necessarily by index) in the other array, and some don't.
I would like to find an efficient way to return pairs of indexes in the two arrays that correspond to matching rows. If they were to be tuples I would expect to return
(0,4)
(2,1)
(3,2)
(4,3)
I can't think of a numpy specific way to do it, but here's what I would do with regular lists:
>>> L1= [[3, 3, 1, 0],
... [2, 3, 1, 3],
... [0, 2, 3, 1],
... [1, 0, 2, 3],
... [3, 1, 0, 2]]
>>> L2 = [[0, 3, 3, 1],
... [0, 2, 3, 1],
... [1, 0, 2, 3],
... [3, 1, 0, 2],
... [3, 3, 1, 0]]
>>> L1 = {tuple(row):i for i,row in enumerate(L1)}
>>> answer = []
>>> for i,row in enumerate(L2):
... if tuple(row) in L1:
... answer.append((L1[tuple(row)], i))
...
>>> answer
[(2, 1), (3, 2), (4, 3), (0, 4)]
This is an all numpy
solution - not that is necessarily better than an iterative Python one. It still has to look at all combinations.
In [53]: np.array(np.all((x[:,None,:]==y[None,:,:]),axis=-1).nonzero()).T.tolist()
Out[53]: [[0, 4], [2, 1], [3, 2], [4, 3]]
The intermediate array is (5,5,4)
. The np.all
reduces it to:
array([[False, False, False, False, True],
[False, False, False, False, False],
[False, True, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False]], dtype=bool)
The rest is just extracting the indices where this is True
In crude tests, this times at 47.8 us; the other answer with the L1
dictionary at 38.3 us; and a third with a double loop at 496 us.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With