Indices that intersect and sort two numpy arrays

Tags:

I have two numpy arrays of integers, both of length several hundred million. Within each array values are unique, and each is initially unsorted.

I would like the indices to each that yield their sorted intersection. For example:

x = np.array([4, 1, 10, 5, 8, 13, 11])
y = np.array([20, 5, 4, 9, 11, 7, 25])

Then the sorted intersection of these is [4, 5, 11], and so we want the indices that turn each of x and y into that array, so we want it to return:

mx = np.array([0, 3, 6])
my = np.array([2, 1, 4])

since then x[mx] == y[my] == np.intersect1d(x, y)

The only solution we have so far involves three different argsorts, so it seems that is unlikely to be optimal.

Each value represents a galaxy, in case that makes the problem more fun.

676

asked Nov 04 '15 18:11

JoeZuntz

2 Answers

Here's an option based on intersect1d's implementation, which is fairly straightforward. It requires one call to argsort.

The admittedly simplistic test passes.

import numpy as np


def my_intersect(x, y):
    """my_intersect(x, y) -> xm, ym
    x, y: 1-d arrays of unique values
    xm, ym: indices into x and y giving sorted intersection
    """
    # basic idea taken from numpy.lib.arraysetops.intersect1d
    aux = np.concatenate((x, y))
    sidx = aux.argsort()
    # Note: intersect1d uses aux[:-1][aux[1:]==aux[:-1]] here - I don't know why the first [:-1] is necessary
    inidx = aux[sidx[1:]] == aux[sidx[:-1]]

    # quicksort is not stable, so must do some work to extract indices
    # (if stable, sidx[inidx.nonzero()]  would be for x)
    # interlace the two sets of indices, and check against lengths
    xym = np.vstack((sidx[inidx.nonzero()],
                     sidx[1:][inidx.nonzero()])).T.flatten()

    xm = xym[xym < len(x)]
    ym = xym[xym >= len(x)] - len(x)

    return xm, ym


def check_my_intersect(x, y):
    mx, my = my_intersect(x, y)
    assert (x[mx] == np.intersect1d(x, y)).all()

    # not really necessary: np.intersect1d returns a sorted list
    assert (x[mx] == sorted(x[mx])).all()
    assert (x[mx] == y[my]).all()


def random_unique_unsorted(n):
    while True:
        x = np.unique(np.random.randint(2*n, size=n))
        if len(x):
            break
    np.random.shuffle(x)
    return x


x = np.array([4, 1, 10, 5, 8, 13, 11])
y = np.array([20, 5, 4, 9, 11, 7, 25])

check_my_intersect(x, y)


for i in range(20):
    x = random_unique_unsorted(100+i)
    y = random_unique_unsorted(200+i)
    check_my_intersect(x, y)

Edit: "Note" comment was confusing (Used ... as speech ellipsis, forgot it was a Python operator too).

answered Sep 21 '22 14:09

Rory Yorke

You could also use np.searchsorted, like so -

def searchsorted_based(x,y):

    # Get argsort for both x and y
    xsort_idx = x.argsort()
    ysort_idx = y.argsort()

    # Sort x and y and store them
    X = x[xsort_idx]
    Y = y[ysort_idx]

    # Find positions of Y in X and the matches by the positions that 
    # shift between 'left' and 'right' based searches. 
    # Use the matches posotions to get corresponding argsort for X.
    x1 = np.searchsorted(X,Y,'left') 
    x2 = np.searchsorted(X,Y,'right') 
    out1 = xsort_idx[x1[x2 != x1]]

    # Repeat for X in Y findings
    y1 = np.searchsorted(Y,X,'left') 
    y2 = np.searchsorted(Y,X,'right')
    out2 = ysort_idx[y1[y2 != y1]]

    return out1, out2

Sample run -

In [100]: x = np.array([4, 1, 10, 5, 8, 13, 11])
     ...: y = np.array([20, 5, 4, 9, 11, 7, 25])
     ...: 

In [101]: searchsorted_based(x,y)
Out[101]: (array([0, 3, 6]), array([2, 1, 4]))

answered Sep 20 '22 14:09

Divakar

Related questions
                            
                                Why don't cython compile logic or to `||` expression?
                            
                                How to make "Copy to clipboard" button/link in django admin for selected field?
                            
                                How to trigger Python script on Raspberry Pi from Node-Red
                            
                                Python Scipy: scipy.stats.spearmanr returning nans
                            
                                Uninstall and re-install pip package from python module
                            
                                How to connect to remote machine via WinRM in Python (pywinrm) using domain account?
                            
                                Select batch of rows sqlalchemy mysql
                            
                                Return std and confidence intervals for out-of-sample prediction in StatsModels
                            
                                from matplotlib import style ImportError: cannot import name 'style'
                            
                                python map exception continue mapping execution
                            
                                ipython on MacOS 10.10 - command not found
                            
                                Python, Matplotlib, Scatter plot, Change color on the clicked point
                            
                                In the Django Admin Site, how can I access model properties through an Inline?
                            
                                Encoding error using df.to_csv()
                            
                                What is the equivalent to scala.util.Try in pyspark?
                            
                                WTForms SelectField not properly coercing for booleans
                            
                                Extracting whole words based on substring matching in python
                            
                                Oracle 11g - query appears to cache even with NOCACHE hint
                            
                                Unicode character not in range when calling locale.strxfrm
                            
                                Maintain order when dumping dict to JSON

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Indices that intersect and sort two numpy arrays

Tags:

python

optimization

sorting

numpy

set-intersection

JoeZuntz

People also ask

2 Answers

Rory Yorke

Divakar

Recent Activity

Donate For Us