Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic and efficient way to do an elementwise "in" using numpy

I'm looking for a way to efficiently get an array of booleans, where given two arrays with equal size a and b, each element is true if the corresponding element of a appears in the corresponding element of b.

For example, the following program:

a = numpy.array([1, 2, 3, 4])
b = numpy.array([[1, 2, 13], [2, 8, 9], [5, 6], [7]])
print(numpy.magic_function(a, b))

Should print

[True, True, False, False]

Keep in mind this function should be the equivalent of

[x in y for x, y in zip(a, b)]

Only numpy-optimized for cases when a and b are big, and each element of b is reasonably small.

like image 661
Martín Fixman Avatar asked Jul 24 '15 19:07

Martín Fixman


1 Answers

To take advantage of NumPy's broadcasting rules you should make array b squared first, which can be achieved using itertools.izip_longest:

from itertools import izip_longest

c = np.array(list(izip_longest(*b))).astype(float)

resulting in:

array([[  1.,   2.,   5.,   7.],
       [  2.,   8.,   6.,  nan],
       [ 13.,   9.,  nan,  nan]])

Then, by doing np.isclose(c, a) you get a 2D array of Booleans showing the difference between each c[:, i] and a[i], according to the broadcasting rules, giving:

array([[ True,  True, False, False],
       [False, False, False, False],
       [False, False, False, False]], dtype=bool)

Which can be used to obtain your answer:

np.any(np.isclose(c, a), axis=0)
#array([ True,  True, False, False], dtype=bool)
like image 96
Saullo G. P. Castro Avatar answered Oct 23 '22 20:10

Saullo G. P. Castro