Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently find row intersections of two 2-D numpy arrays

Tags:

python

numpy

I am trying to figure out an efficient way of finding row intersections of two np.arrays.

Two arrays have the same shapes, and duplicate values in each row cannot happen.

For example:

import numpy as np

a = np.array([[2,5,6],
              [8,2,3],
              [4,1,5],
              [1,7,9]])

b = np.array([[2,3,4],  # one element(2) in common with a[0] -> 1
              [7,4,3],  # one element(3) in common with a[1] -> 1
              [5,4,1],  # three elements(5,4,1) in common with a[2] -> 3
              [7,6,9]]) # two element(9,7) in common with a[3] -> 2

My desired output is : np.array([1,1,3,2])

It is easy to do this with a loop:

def get_intersect1ds(a, b):
    result = np.empty(a.shape[0], dtype=np.int)
    for i in xrange(a.shape[0]):
        result[i] = (len(np.intersect1d(a[i], b[i])))
    return result

Result:

>>> get_intersect1ds(a, b)
array([1, 1, 3, 2])

But is there a more efficient way to do it?

like image 387
Akavall Avatar asked Nov 01 '13 16:11

Akavall


People also ask

How do you find the intersection of two arrays in Numpy?

Step 1: Import numpy. Step 2: Define two numpy arrays. Step 3: Find intersection between the arrays using the numpy. intersect1d() function.

How do you find the common values between two Numpy arrays?

In NumPy, we can find common values between two arrays with the help intersect1d(). It will take parameter two arrays and it will return an array in which all the common elements will appear.

Can Numpy arrays have more than 2 dimensions?

Creating arrays with more than one dimensionIn general numpy arrays can have more than one dimension. One way to create such array is to start with a 1-dimensional array and use the numpy reshape() function that rearranges elements of that array into a new shape.

How do you find the difference between two arrays in Numpy?

Step 1: Import numpy. Step 2: Define two numpy arrays. Step 3: Find the set difference between these arrays using the setdiff1d() function. Step 4: Print the output.


2 Answers

If you have no duplicates within a row you can try to replicate what np.intersect1d does under the hood (see the source code here):

>>> c = np.hstack((a, b))
>>> c
array([[2, 5, 6, 2, 3, 4],
       [8, 2, 3, 7, 4, 3],
       [4, 1, 5, 5, 4, 1],
       [1, 7, 9, 7, 6, 9]])
>>> c.sort(axis=1)
>>> c
array([[2, 2, 3, 4, 5, 6],
       [2, 3, 3, 4, 7, 8],
       [1, 1, 4, 4, 5, 5],
       [1, 6, 7, 7, 9, 9]])
>>> c[:, 1:] == c[:, :-1]
array([[ True, False, False, False, False],
       [False,  True, False, False, False],
       [ True, False,  True, False,  True],
       [False, False,  True, False,  True]], dtype=bool)
>>> np.sum(c[:, 1:] == c[:, :-1], axis=1)
array([1, 1, 3, 2])
like image 164
Jaime Avatar answered Sep 19 '22 00:09

Jaime


This answer might not be viable, because if the input has shape (N, M), it generates an intermediate array with size (N, M, M), but it's always fun to see what you can do with broadcasting:

In [43]: a
Out[43]: 
array([[2, 5, 6],
       [8, 2, 3],
       [4, 1, 5],
       [1, 7, 9]])

In [44]: b
Out[44]: 
array([[2, 3, 4],
       [7, 4, 3],
       [5, 4, 1],
       [7, 6, 9]])

In [45]: (np.expand_dims(a, -1) == np.expand_dims(b, 1)).sum(axis=-1).sum(axis=-1)
Out[45]: array([1, 1, 3, 2])

For large arrays, the method could be made more memory-friendly by doing the operation in batches.

like image 30
Warren Weckesser Avatar answered Sep 20 '22 00:09

Warren Weckesser