The similar question has been asked, but none of the answers quite do what I need - some allow multidimensional searches (aka 'rows' option in matlab) but dont return the index. Some return the index but dont allow rows. My arrays are very large (1M x 2) and I have bee successful in making a loop that works, but obviously that is very slow. In matlab, the built-in ismember function takes about 10 seconds.
Here is what I am looking for:
a=np.array([[4, 6],[2, 6],[5, 2]])
b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
The exact matlab function that does the trick is:
[~,index] = ismember(a,b,'rows')
where
index = [6, 3, 9]
import numpy as np
def asvoid(arr):
"""
View the array as dtype np.void (bytes)
This views the last axis of ND-arrays as bytes so you can perform comparisons on
the entire row.
http://stackoverflow.com/a/16840350/190597 (Jaime, 2013-05)
Warning: When using asvoid for comparison, note that float zeros may compare UNEQUALLY
>>> asvoid([-0.]) == asvoid([0.])
array([False], dtype=bool)
"""
arr = np.ascontiguousarray(arr)
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
def in1d_index(a, b):
voida, voidb = map(asvoid, (a, b))
return np.where(np.in1d(voidb, voida))[0]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
print(in1d_index(a, b))
prints
[2 5 8]
This would be equivalent to Matlab's [3, 6, 9], since Python uses 0-based indexing.
Some caveats:
a
in b
.asvoid([-0.]) == asvoid([0.])
returns
array([False])
.Despite the caveats, one might choose to use in1d_index
anyway for the sake of speed:
def ismember_rows(a, b):
# http://stackoverflow.com/a/22705773/190597 (ashg)
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
In [41]: a2 = np.tile(a,(2000,1))
In [42]: b2 = np.tile(b,(2000,1))
In [46]: %timeit in1d_index(a2, b2)
100 loops, best of 3: 8.49 ms per loop
In [47]: %timeit ismember_rows(a2, b2)
1 loops, best of 3: 5.55 s per loop
So in1d_index
is ~650x faster (for arrays of length in the low thousands), but again note the comparison is not exactly apples-to-apples since in1d_index
returns the indices in increasing order, while ismember_rows
returns the indices in the order rows of a
show up in b
.
import numpy as np
def ismember_rows(a, b):
'''Equivalent of 'ismember' from Matlab
a.shape = (nRows_a, nCol)
b.shape = (nRows_b, nCol)
return the idx where b[idx] == a
'''
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
idx = ismember_rows(a, b)
print idx
print np.all(b[idx] == a)
array([5, 2, 8])
True
e...I used broadcasting
--------------------------[update]------------------------------
def ismember(a, b):
return np.flatnonzero(np.in1d(b[:,0], a[:,0]) & np.in1d(b[:,1], a[:,1]))
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
a2 = np.tile(a,(2000,1))
b2 = np.tile(b,(2000,1))
%timeit timeit in1d_index(a2, b2)
# 100 loops, best of 3: 8.74 ms per loop
%timeit ismember(a2, b2)
# 100 loops, best of 3: 8.5 ms per loop
np.all(in1d_index(a2, b2) == ismember(a2, b2))
# True
as what unutbu said, the indices are returned in increasing order
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With