I am trying to figure out an efficient way of finding row intersections of two np.arrays
.
Two arrays have the same shapes, and duplicate values in each row cannot happen.
For example:
import numpy as np
a = np.array([[2,5,6],
[8,2,3],
[4,1,5],
[1,7,9]])
b = np.array([[2,3,4], # one element(2) in common with a[0] -> 1
[7,4,3], # one element(3) in common with a[1] -> 1
[5,4,1], # three elements(5,4,1) in common with a[2] -> 3
[7,6,9]]) # two element(9,7) in common with a[3] -> 2
My desired output is : np.array([1,1,3,2])
It is easy to do this with a loop:
def get_intersect1ds(a, b):
result = np.empty(a.shape[0], dtype=np.int)
for i in xrange(a.shape[0]):
result[i] = (len(np.intersect1d(a[i], b[i])))
return result
Result:
>>> get_intersect1ds(a, b)
array([1, 1, 3, 2])
But is there a more efficient way to do it?
Step 1: Import numpy. Step 2: Define two numpy arrays. Step 3: Find intersection between the arrays using the numpy. intersect1d() function.
In NumPy, we can find common values between two arrays with the help intersect1d(). It will take parameter two arrays and it will return an array in which all the common elements will appear.
Creating arrays with more than one dimensionIn general numpy arrays can have more than one dimension. One way to create such array is to start with a 1-dimensional array and use the numpy reshape() function that rearranges elements of that array into a new shape.
Step 1: Import numpy. Step 2: Define two numpy arrays. Step 3: Find the set difference between these arrays using the setdiff1d() function. Step 4: Print the output.
If you have no duplicates within a row you can try to replicate what np.intersect1d
does under the hood (see the source code here):
>>> c = np.hstack((a, b))
>>> c
array([[2, 5, 6, 2, 3, 4],
[8, 2, 3, 7, 4, 3],
[4, 1, 5, 5, 4, 1],
[1, 7, 9, 7, 6, 9]])
>>> c.sort(axis=1)
>>> c
array([[2, 2, 3, 4, 5, 6],
[2, 3, 3, 4, 7, 8],
[1, 1, 4, 4, 5, 5],
[1, 6, 7, 7, 9, 9]])
>>> c[:, 1:] == c[:, :-1]
array([[ True, False, False, False, False],
[False, True, False, False, False],
[ True, False, True, False, True],
[False, False, True, False, True]], dtype=bool)
>>> np.sum(c[:, 1:] == c[:, :-1], axis=1)
array([1, 1, 3, 2])
This answer might not be viable, because if the input has shape (N, M), it generates an intermediate array with size (N, M, M), but it's always fun to see what you can do with broadcasting:
In [43]: a
Out[43]:
array([[2, 5, 6],
[8, 2, 3],
[4, 1, 5],
[1, 7, 9]])
In [44]: b
Out[44]:
array([[2, 3, 4],
[7, 4, 3],
[5, 4, 1],
[7, 6, 9]])
In [45]: (np.expand_dims(a, -1) == np.expand_dims(b, 1)).sum(axis=-1).sum(axis=-1)
Out[45]: array([1, 1, 3, 2])
For large arrays, the method could be made more memory-friendly by doing the operation in batches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With