What's the most efficient way to find which elements of one array are close to any element in another?

Tags:

I have two 1-dimensional numpy.ndarray objects, and want to work out which elements in the first array are within dx of any element in the second.

What I have currently is

# setup
numpy.random.seed(1)
a = numpy.random.random(1000)  # create one array
numpy.random.seed(2)
b = numpy.random.random(1000)  # create second array
dx = 1e-4  # close-ness parameter

# function I want to optimise
def find_all_close(a, b):
    # compare one number to all elements of b
    def _is_coincident(t):
        return (numpy.abs(b - t) <= dx).any()
    # vectorize and loop over a
    is_coincident = numpy.vectorize(_is_coincident)
    return is_coincident(a).nonzero()[0]

which returns a timeit result as follows

10 loops, best of 3: 16.5 msec per loop

What's the best way to optimise the find_all_close function, especially if a and b are guaranteed to be float arrays sorted in ascending order when they get passed to find_all_close, possibly with cython or similar?

In practice I'm working with arrays between 10,000 and 100,000 elements (or larger), and running this whole operation over a few hundred different b arrays.

705

asked Apr 01 '16 03:04

Duncan Macleod

1 Answers

The easiest way to do this is for each element in the first array, do two binary searches the second array to find the element at most dx below and at most dx above the element in the first array. This is linearithmic time:

left = np.searchsorted(b, a - dx, 'left')
right = np.searchsorted(b, a + dx, 'right')
a[left != right]

The linear algorithm has two pointers into the second array that keep track of a moving window as you iterate over elements in the first array.

answered Nov 14 '22 23:11

Neil G

Related questions
                            
                                create custom methods in django class base views
                            
                                When does slicing operator create a shallow copy in Python?
                            
                                How to get the optimal optimization variables in Pulp with Python?
                            
                                Python timeout decorator
                            
                                PIL remove background image from image
                            
                                getting current <select> value from drop-down menu with Python Selenium
                            
                                Is there any way to combine CreateView and UpdateView?
                            
                                Print lists in a list in columns
                            
                                PyPy: ImportError: No module named requests
                            
                                Python > Save help() output to formated html file?
                            
                                Check if an object (with certain properties values) not in list
                            
                                Does python have an EXIT_SUCCESS constant?
                            
                                Detecting similar points between two pictures and then overlaying them (Python)
                            
                                How to post Django static files in production
                            
                                Slicing a NumPy array within a loop [duplicate]
                            
                                How can I extract the abstract from efetch (Biopython, Entrez)?
                            
                                Bokeh widget-Working Checkbox Group Example
                            
                                Preceding Word Length
                            
                                How can I disable the label when plotting pandas data?
                            
                                Unhashable type: 'list' error in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the most efficient way to find which elements of one array are close to any element in another?

Tags:

python

arrays

algorithm

numpy

Duncan Macleod

People also ask

1 Answers

Neil G

Recent Activity

Donate For Us