Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find nearest indices for one array against all values in another array - Python / NumPy

I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.

My current approach with numpy:

import numpy as np

refArray = np.random.random(16);
myArray = np.random.random(1000);


def find_nearest(array, value):
    idx = (np.abs(array-value)).argmin()
    return idx;

for value in np.nditer(myArray):
    index = find_nearest(refArray, value);
    print(index);

Unfortunately, this takes ages for a large amount of values. Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?

FYI: I don't necessarily need numpy in my script.

Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.

like image 537
Alexander Avatar asked Jul 27 '17 11:07

Alexander


1 Answers

Here's one vectorized approach with np.searchsorted based on this post -

def closest_argmin(A, B):
    L = B.size
    sidx_B = B.argsort()
    sorted_B = B[sidx_B]
    sorted_idx = np.searchsorted(sorted_B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
    return sidx_B[sorted_idx-mask]

Brief explanation :

  • Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.

  • Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.

  • Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.

Benchmarking

Original approach -

def org_app(myArray, refArray):
    out1 = np.empty(myArray.size, dtype=int)
    for i, value in enumerate(myArray):
        # find_nearest from posted question
        index = find_nearest(refArray, value)
        out1[i] = index
    return out1

Timings and verification -

In [188]: refArray = np.random.random(16)
     ...: myArray = np.random.random(1000)
     ...: 

In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop

In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop

In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True

50x+ speedup for the posted sample and hopefully more for larger datasets!

like image 120
Divakar Avatar answered Nov 13 '22 09:11

Divakar