Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check how many numpy array within a numpy array are equal to other numpy arrays within another numpy array of different size

My problem

Suppose I have

a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])

They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)

I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!

How can I do that?

My Try

count = 0
for bitem in b:
     for aitem in a:
         if aitem==bitem:
               count+=1

Is there a better way? Especially in one line, maybe with some comprehension..

like image 860
Euler_Salter Avatar asked Aug 29 '17 10:08

Euler_Salter


3 Answers

The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:

import numpy_indexed as npi
count = len(npi.intersection(a, b))

Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:

count = npi.in_(b, a).sum()

Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.

like image 189
Eelco Hoogendoorn Avatar answered Oct 19 '22 20:10

Eelco Hoogendoorn


Here is a simple way to do it:

a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])

count = np.count_nonzero(
    np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))

print(count)
>>> 2
like image 42
jdehesa Avatar answered Oct 19 '22 21:10

jdehesa


You can do what you want in one liner as follows:

count = sum([np.array_equal(x,y) for x,y in product(a,b)])

Explanation

Here's an explanation of what's happening:

  1. Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
  2. Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
  3. True is equal to 1 when using sum on a list

Full example:

The final code looks like this:

import numpy as np 
from itertools import product 
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2
like image 2
Mohamed Ali JAMAOUI Avatar answered Oct 19 '22 21:10

Mohamed Ali JAMAOUI