Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy.where used with list of values

Tags:

python

numpy

I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:

import numpy as np

A = np.array([[0, 3, 1],
           [9, 4, 6],
           [2, 7, 3],
           [1, 8, 9],
           [6, 2, 7],
           [4, 8, 0]])

B = np.array([0,1,2,3])

results = []

for elem in B:
    results.append(np.where(A==elem)[0])

This works and results in the following array:

[array([0, 5], dtype=int64),
 array([0, 3], dtype=int64),
 array([2, 4], dtype=int64),
 array([0, 2], dtype=int64)]

But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:

out1 = np.where(np.in1d(A, B))

num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0 
out2 = A[A == num_arr[idx]]

But these give me incorrect values:

In [36]: out1
Out[36]: (array([ 0,  1,  2,  6,  8,  9, 13, 17], dtype=int64),)

In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])

Thanks for your help

like image 821
Anthony Lethuillier Avatar asked Apr 23 '18 18:04

Anthony Lethuillier


2 Answers

If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:

input:

np.isin(A,B).sum(axis=1)>0 

output:

array([ True, False,  True,  True,  True,  True])
like image 150
denis_smyslov Avatar answered Nov 02 '22 18:11

denis_smyslov


Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.

In [50]: d = np.where(B[:, None] == A.ravel())[1]

In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))                 
                       ^
               # expected result 

* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance. Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.

like image 37
Mazdak Avatar answered Nov 02 '22 20:11

Mazdak