Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: check if an numpy array contains any element of another array

Tags:

python

numpy

What is the best way to check if an numpy array contains any element of another array?

example:

array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]`

I want to get a True if array1 contains any value of array2, otherwise a False.

like image 248
Alex Avatar asked Mar 23 '16 23:03

Alex


People also ask

How do you check if an array contains a value from another array Python?

Use numpy. isin() to return a boolean array of the same shape as Y that is True where an element of Y is in X and False otherwise. Use numpy. in1d() to test whether each element of a 1-D array X is also present in a second array Y.

How do you check if an array contains an element from another array?

Use the inbuilt ES6 function some() to iterate through each and every element of first array and to test the array. Use the inbuilt function includes() with second array to check if element exist in the first array or not. If element exist then return true else return false.

How do you compare elements of two NumPy arrays?

To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.

How do you check if an element is in an array NumPy?

Using Numpy array, we can easily find whether specific values are present or not. For this purpose, we use the “in” operator. “in” operator is used to check whether certain element and values are present in a given sequence and hence return Boolean values 'True” and “False“.


2 Answers

Using Pandas, you can use isin:

a1 = np.array([10,5,4,13,10,1,1,22,7,3,15,9])
a2 = np.array([3,4,9,10,13,15,16,18,19,20,21,22,23])

>>> pd.Series(a1).isin(a2).any()
True

And using the in1d numpy function(per the comment from @Norman):

>>> np.any(np.in1d(a1, a2))
True

For small arrays such as those in this example, the solution using set is the clear winner. For larger, dissimilar arrays (i.e. no overlap), the Pandas and Numpy solutions are faster. However, np.intersect1d appears to excel for larger arrays.

Small arrays (12-13 elements)

%timeit set(array1) & set(array2)
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.69 µs per loop

%timeit any(i in a1 for i in a2)
The slowest run took 12.29 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 1.88 µs per loop

%timeit np.intersect1d(a1, a2)
The slowest run took 10.29 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 15.6 µs per loop

%timeit np.any(np.in1d(a1, a2))
10000 loops, best of 3: 27.1 µs per loop

%timeit pd.Series(a1).isin(a2).any()
10000 loops, best of 3: 135 µs per loop

Using an array with 100k elements (no overlap):

a3 = np.random.randint(0, 100000, 100000)
a4 = a3 + 100000

%timeit np.intersect1d(a3, a4)
100 loops, best of 3: 13.8 ms per loop    

%timeit pd.Series(a3).isin(a4).any()
100 loops, best of 3: 18.3 ms per loop

%timeit np.any(np.in1d(a3, a4))
100 loops, best of 3: 18.4 ms per loop

%timeit set(a3) & set(a4)
10 loops, best of 3: 23.6 ms per loop

%timeit any(i in a3 for i in a4)
1 loops, best of 3: 34.5 s per loop
like image 200
Alexander Avatar answered Oct 24 '22 16:10

Alexander


You can try this

>>> array1 = [10,5,4,13,10,1,1,22,7,3,15,9]
>>> array2 = [3,4,9,10,13,15,16,18,19,20,21,22,23]
>>> set(array1) & set(array2)
set([3, 4, 9, 10, 13, 15, 22])

If you get result means there are common elements in both array.

If result is empty means no common elements.

like image 23
Nilesh Avatar answered Oct 24 '22 15:10

Nilesh