This problem seems easy but I cannot quite get a nice-looking solution. I have two numpy arrays (A and B), and I want to get the indices of A where the elements of A are in B and also get the indices of A where the elements are not in B.
So, if
A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6])
Currently I am using
C = np.searchsorted(A,B)
which takes advantage of the fact that A
is in order, and gives me [1, 3, 5]
, the indices of the elements that are in A
. This is great, but how do I get D = [0,2,4,6]
, the indices of elements of A
that are not in B
?
To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.
In NumPy with the help of any() function, we can check whether any of the elements of a given array in NumPy is non-zero. We will pass an array in the any() function if it returns true then any of the element of the array is non zero if it returns false then all the elements of the array are zero.
Check if all elements are equal in a 1D Numpy Array using numpy. all() This confirms that all values in the array are the same.
The variable is_in_list indicates if there is any array within he list of numpy arrays which is equal to the array to check. Show activity on this post. You are assuming that all arrays are of the same shape, which is not clear from the question. Then you convert the whole list to an array.
searchsorted
may give you wrong answer if not every element of B is in A. You can use numpy.in1d
:
A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6,8]) mask = np.in1d(A, B) print np.where(mask)[0] print np.where(~mask)[0]
output is:
[1 3 5] [0 2 4 6]
However in1d()
uses sort, which is slow for large datasets. You can use pandas if your dataset is large:
import pandas as pd np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]
Here is the time comparison:
A = np.random.randint(0, 1000, 10000) B = np.random.randint(0, 1000, 10000) %timeit np.where(np.in1d(A, B))[0] %timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]
output:
100 loops, best of 3: 2.09 ms per loop 1000 loops, best of 3: 594 µs per loop
import numpy as np A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6]) C = np.searchsorted(A, B) D = np.delete(np.arange(np.alen(A)), C) D #array([0, 2, 4, 6])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With