I want to get the intersecting (common) rows across two 2D numpy arrays. E.g., if the following arrays are passed as inputs: <pre class="prettyprint"><code>array([[1, 4], [2, 5], [3, 6]]) array([[1, 4], [3, 6], [7, 8]]) </code></pre> the output should be: <pre class="prettyprint"><code>array([[1, 4], [3, 6]) </code></pre> I know how to do this with loops. I'm looking at a Pythonic/Numpy way to do this.

You could use Python's sets: <pre class="prettyprint"><code>>>> import numpy as np >>> A = np.array([[1,4],[2,5],[3,6]]) >>> B = np.array([[1,4],[3,6],[7,8]]) >>> aset = set([tuple(x) for x in A]) >>> bset = set([tuple(x) for x in B]) >>> np.array([x for x in aset & bset]) array([[1, 4], [3, 6]]) </code></pre> As Rob Cowie points out, this can be done more concisely as <pre class="prettyprint"><code>np.array([x for x in set(tuple(x) for x in A) & set(tuple(x) for x in B)]) </code></pre> There's probably a way to do this without all the going back and forth from arrays to tuples, but it's not coming to me right now.

Get intersecting rows across two 2D numpy arrays

Tags:

python

numpy

I want to get the intersecting (common) rows across two 2D numpy arrays. E.g., if the following arrays are passed as inputs:

array([[1, 4],        [2, 5],        [3, 6]])  array([[1, 4],        [3, 6],        [7, 8]])

the output should be:

array([[1, 4],        [3, 6])

I know how to do this with loops. I'm looking at a Pythonic/Numpy way to do this.

563

asked Nov 29 '11 20:11

Karthik

2 Answers

For short arrays, using sets is probably the clearest and most readable way to do it.

Another way is to use numpy.intersect1d. You'll have to trick it into treating the rows as a single value, though... This makes things a bit less readable...

import numpy as np  A = np.array([[1,4],[2,5],[3,6]]) B = np.array([[1,4],[3,6],[7,8]])  nrows, ncols = A.shape dtype={'names':['f{}'.format(i) for i in range(ncols)],        'formats':ncols * [A.dtype]}  C = np.intersect1d(A.view(dtype), B.view(dtype))  # This last bit is optional if you're okay with "C" being a structured array... C = C.view(A.dtype).reshape(-1, ncols)

For large arrays, this should be considerably faster than using sets.

176

answered Oct 11 '22 08:10

Joe Kington

You could use Python's sets:

>>> import numpy as np >>> A = np.array([[1,4],[2,5],[3,6]]) >>> B = np.array([[1,4],[3,6],[7,8]]) >>> aset = set([tuple(x) for x in A]) >>> bset = set([tuple(x) for x in B]) >>> np.array([x for x in aset & bset]) array([[1, 4],        [3, 6]])

As Rob Cowie points out, this can be done more concisely as

np.array([x for x in set(tuple(x) for x in A) & set(tuple(x) for x in B)])

There's probably a way to do this without all the going back and forth from arrays to tuples, but it's not coming to me right now.

answered Oct 11 '22 10:10

mtrw

Related questions
                            
                                Minimum Euclidean distance between points in two different Numpy arrays, not within
                            
                                Numpy vs Cython speed
                            
                                Matrix Multiplication in Clojure vs Numpy
                            
                                Python - Efficient way to add rows to dataframe
                            
                                Specifying dtype float32 with pandas.read_csv on pandas 0.10.1
                            
                                Cython Numpy warning about NPY_NO_DEPRECATED_API when using MemoryView
                            
                                TensorFlow ValueError: Cannot feed value of shape (64, 64, 3) for Tensor u'Placeholder:0', which has shape '(?, 64, 64, 3)'
                            
                                When to apply(pd.to_numeric) and when to astype(np.float64) in python?
                            
                                python numpy ndarray element-wise mean
                            
                                RuntimeWarning: divide by zero encountered in log
                            
                                Concatenate sparse matrices in Python using SciPy/Numpy
                            
                                whats the fastest way to find eigenvalues/vectors in python?
                            
                                What is the correct way to change image channel ordering between channels first and channels last?
                            
                                which is faster for load: pickle or hdf5 in python
                            
                                Importing the numpy c-extensions failed
                            
                                Inverse Distance Weighted (IDW) Interpolation with Python
                            
                                How to resample a dataframe with different functions applied to each column?
                            
                                size of NumPy array
                            
                                Set values on the diagonal of pandas.DataFrame
                            
                                array.shape() giving error tuple not callable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With