Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get intersecting rows across two 2D numpy arrays

Tags:

python

numpy

I want to get the intersecting (common) rows across two 2D numpy arrays. E.g., if the following arrays are passed as inputs:

array([[1, 4],        [2, 5],        [3, 6]])  array([[1, 4],        [3, 6],        [7, 8]]) 

the output should be:

array([[1, 4],        [3, 6]) 

I know how to do this with loops. I'm looking at a Pythonic/Numpy way to do this.

like image 563
Karthik Avatar asked Nov 29 '11 20:11

Karthik


People also ask

How do you find the intersection of two arrays in numpy?

intersect1d() function find the intersection of two arrays and return the sorted, unique values that are in both of the input arrays. Parameters : arr1, arr2 : [array_like] Input arrays.

How do you compare two 2D numpy arrays?

To check if two NumPy arrays A and B are equal: Use a comparison operator (==) to form a comparison array. Check if all the elements in the comparison array are True.


2 Answers

For short arrays, using sets is probably the clearest and most readable way to do it.

Another way is to use numpy.intersect1d. You'll have to trick it into treating the rows as a single value, though... This makes things a bit less readable...

import numpy as np  A = np.array([[1,4],[2,5],[3,6]]) B = np.array([[1,4],[3,6],[7,8]])  nrows, ncols = A.shape dtype={'names':['f{}'.format(i) for i in range(ncols)],        'formats':ncols * [A.dtype]}  C = np.intersect1d(A.view(dtype), B.view(dtype))  # This last bit is optional if you're okay with "C" being a structured array... C = C.view(A.dtype).reshape(-1, ncols) 

For large arrays, this should be considerably faster than using sets.

like image 176
Joe Kington Avatar answered Oct 11 '22 08:10

Joe Kington


You could use Python's sets:

>>> import numpy as np >>> A = np.array([[1,4],[2,5],[3,6]]) >>> B = np.array([[1,4],[3,6],[7,8]]) >>> aset = set([tuple(x) for x in A]) >>> bset = set([tuple(x) for x in B]) >>> np.array([x for x in aset & bset]) array([[1, 4],        [3, 6]]) 

As Rob Cowie points out, this can be done more concisely as

np.array([x for x in set(tuple(x) for x in A) & set(tuple(x) for x in B)]) 

There's probably a way to do this without all the going back and forth from arrays to tuples, but it's not coming to me right now.

like image 33
mtrw Avatar answered Oct 11 '22 10:10

mtrw