Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if each element in a numpy array is in another array

Tags:

This problem seems easy but I cannot quite get a nice-looking solution. I have two numpy arrays (A and B), and I want to get the indices of A where the elements of A are in B and also get the indices of A where the elements are not in B.

So, if

A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6]) 

Currently I am using

C = np.searchsorted(A,B) 

which takes advantage of the fact that A is in order, and gives me [1, 3, 5], the indices of the elements that are in A. This is great, but how do I get D = [0,2,4,6], the indices of elements of A that are not in B?

like image 633
DanHickstein Avatar asked Apr 11 '13 02:04

DanHickstein


People also ask

How do you compare elements of two NumPy arrays?

To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.

How do you check all elements in a NumPy array?

In NumPy with the help of any() function, we can check whether any of the elements of a given array in NumPy is non-zero. We will pass an array in the any() function if it returns true then any of the element of the array is non zero if it returns false then all the elements of the array are zero.

How do you check if all values in an array are the same NumPy?

Check if all elements are equal in a 1D Numpy Array using numpy. all() This confirms that all values in the array are the same.

How do you check if an array is in a list of arrays?

The variable is_in_list indicates if there is any array within he list of numpy arrays which is equal to the array to check. Show activity on this post. You are assuming that all arrays are of the same shape, which is not clear from the question. Then you convert the whole list to an array.


2 Answers

searchsorted may give you wrong answer if not every element of B is in A. You can use numpy.in1d:

A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6,8]) mask = np.in1d(A, B) print np.where(mask)[0] print np.where(~mask)[0] 

output is:

[1 3 5] [0 2 4 6] 

However in1d() uses sort, which is slow for large datasets. You can use pandas if your dataset is large:

import pandas as pd np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0] 

Here is the time comparison:

A = np.random.randint(0, 1000, 10000) B = np.random.randint(0, 1000, 10000)  %timeit np.where(np.in1d(A, B))[0] %timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0] 

output:

100 loops, best of 3: 2.09 ms per loop 1000 loops, best of 3: 594 µs per loop 
like image 71
HYRY Avatar answered Sep 17 '22 01:09

HYRY


import numpy as np  A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6]) C = np.searchsorted(A, B)  D = np.delete(np.arange(np.alen(A)), C)  D #array([0, 2, 4, 6]) 
like image 44
askewchan Avatar answered Sep 21 '22 01:09

askewchan