Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to find which elements of one array are close to any element in another?

I have two 1-dimensional numpy.ndarray objects, and want to work out which elements in the first array are within dx of any element in the second.

What I have currently is

# setup
numpy.random.seed(1)
a = numpy.random.random(1000)  # create one array
numpy.random.seed(2)
b = numpy.random.random(1000)  # create second array
dx = 1e-4  # close-ness parameter

# function I want to optimise
def find_all_close(a, b):
    # compare one number to all elements of b
    def _is_coincident(t):
        return (numpy.abs(b - t) <= dx).any()
    # vectorize and loop over a
    is_coincident = numpy.vectorize(_is_coincident)
    return is_coincident(a).nonzero()[0]

which returns a timeit result as follows

10 loops, best of 3: 16.5 msec per loop

What's the best way to optimise the find_all_close function, especially if a and b are guaranteed to be float arrays sorted in ascending order when they get passed to find_all_close, possibly with cython or similar?

In practice I'm working with arrays between 10,000 and 100,000 elements (or larger), and running this whole operation over a few hundred different b arrays.

like image 705
Duncan Macleod Avatar asked Apr 01 '16 03:04

Duncan Macleod


People also ask

How do you check if an array contains any element of another array?

Use the inbuilt ES6 function some() to iterate through each and every element of first array and to test the array. Use the inbuilt function includes() with second array to check if element exist in the first array or not. If element exist then return true else return false.

How do you find the closest value in an array?

Find the closest value in array using reduce() The easiest way to do this is to use Math. abs(), so lets use that. With this function we check whether the absolute value of (b – 8) is less than the absolute value of (a – 8) and then return the winner.

How do I find the closest number in an array in C?

So if the array is like [2, 5, 6, 7, 8, 8, 9] and the target number is 4, then closest element is 5. We can solve this by traversing through the given array and keep track of absolute difference of current element with every element. Finally return the element that has minimum absolute difference.


1 Answers

The easiest way to do this is for each element in the first array, do two binary searches the second array to find the element at most dx below and at most dx above the element in the first array. This is linearithmic time:

left = np.searchsorted(b, a - dx, 'left')
right = np.searchsorted(b, a + dx, 'right')
a[left != right]

The linear algorithm has two pointers into the second array that keep track of a moving window as you iterate over elements in the first array.

like image 64
Neil G Avatar answered Nov 14 '22 23:11

Neil G