Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all elements in a numpy 2-dimensional array that match a certain list?

I have a 2-dimensional NumPy array, for example:

array([[1, 1, 0, 2, 2],
       [1, 1, 0, 2, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

I would like to get all elements from that array which are in a certain list, for example (1, 3, 4). The desired result in the example case would be:

array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

I know that I can just do (as recommended here Numpy: find elements within range):

np.logical_or(
    np.logical_or(cc_labeled == 1, cc_labeled == 3),
    cc_labeled == 4
)

, but this will be only reasonably effective in the example case. In reality iteratively using for loop and numpy.logical_or turned out to be really slow since the list of possible values is in thousands (and numpy array has approximately the dimension of 1000 x 1000).

like image 545
Rauni Lillemets Avatar asked Mar 14 '23 13:03

Rauni Lillemets


2 Answers

Use np.in1d:

np.in1d(arr, [1,3,4]).reshape(arr.shape)

in1d, as the name suggest, operates on the flattened array, therefor you need to reshape after the operation.

like image 21
shx2 Avatar answered Mar 16 '23 02:03

shx2


You can use np.in1d -

A*np.in1d(A,[1,3,4]).reshape(A.shape)

Also, np.where could be used -

np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)

You can also use np.searchsorted to find such matches by using its optional 'side' argument with inputs as left and right and noting that for the matches, the searchsorted would output different results with these two inputs. Thus, an equivalent of np.in1d(A,[1,3,4]) would be -

M = np.searchsorted([1,3,4],A.ravel(),'left') != \
    np.searchsorted([1,3,4],A.ravel(),'right')

Thus, the final output would be -

out = A*M.reshape(A.shape)

Please note that if the input search list is not sorted, you need to use the optional argumentsorter with its argsort indices in np.searchsorted.

Sample run -

In [321]: A
Out[321]: 
array([[1, 1, 0, 2, 2],
       [1, 1, 0, 2, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [322]: A*np.in1d(A,[1,3,4]).reshape(A.shape)
Out[322]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [323]: np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)
Out[323]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [324]: M = np.searchsorted([1,3,4],A.ravel(),'left') != \
     ...:     np.searchsorted([1,3,4],A.ravel(),'right')
     ...: A*M.reshape(A.shape)
     ...: 
Out[324]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

Runtime tests and verify outputs -

In [309]: # Inputs
     ...: A = np.random.randint(0,1000,(400,500))
     ...: lst = np.sort(np.random.randint(0,1000,(100))).tolist()
     ...: 
     ...: def func1(A,lst):                         
     ...:   return A*np.in1d(A,lst).reshape(A.shape)
     ...: 
     ...: def func2(A,lst):                         
     ...:   return np.where(np.in1d(A,lst).reshape(A.shape),A,0)
     ...: 
     ...: def func3(A,lst):                         
     ...:   mask = np.searchsorted(lst,A.ravel(),'left') != \
     ...:          np.searchsorted(lst,A.ravel(),'right')
     ...:   return A*mask.reshape(A.shape)
     ...: 

In [310]: np.allclose(func1(A,lst),func2(A,lst))
Out[310]: True

In [311]: np.allclose(func1(A,lst),func3(A,lst))
Out[311]: True

In [312]: %timeit func1(A,lst)
10 loops, best of 3: 30.9 ms per loop

In [313]: %timeit func2(A,lst)
10 loops, best of 3: 30.9 ms per loop

In [314]: %timeit func3(A,lst)
10 loops, best of 3: 28.6 ms per loop
like image 98
Divakar Avatar answered Mar 16 '23 02:03

Divakar