I have two 1-dimensional numpy.ndarray
objects, and want to work out which elements in the first array are within dx
of any element in the second.
What I have currently is
# setup
numpy.random.seed(1)
a = numpy.random.random(1000) # create one array
numpy.random.seed(2)
b = numpy.random.random(1000) # create second array
dx = 1e-4 # close-ness parameter
# function I want to optimise
def find_all_close(a, b):
# compare one number to all elements of b
def _is_coincident(t):
return (numpy.abs(b - t) <= dx).any()
# vectorize and loop over a
is_coincident = numpy.vectorize(_is_coincident)
return is_coincident(a).nonzero()[0]
which returns a timeit
result as follows
10 loops, best of 3: 16.5 msec per loop
What's the best way to optimise the find_all_close
function, especially if a
and b
are guaranteed to be float
arrays sorted in ascending order when they get passed to find_all_close
, possibly with cython or similar?
In practice I'm working with arrays between 10,000 and 100,000 elements (or larger), and running this whole operation over a few hundred different b
arrays.
Use the inbuilt ES6 function some() to iterate through each and every element of first array and to test the array. Use the inbuilt function includes() with second array to check if element exist in the first array or not. If element exist then return true else return false.
Find the closest value in array using reduce() The easiest way to do this is to use Math. abs(), so lets use that. With this function we check whether the absolute value of (b – 8) is less than the absolute value of (a – 8) and then return the winner.
So if the array is like [2, 5, 6, 7, 8, 8, 9] and the target number is 4, then closest element is 5. We can solve this by traversing through the given array and keep track of absolute difference of current element with every element. Finally return the element that has minimum absolute difference.
The easiest way to do this is for each element in the first array, do two binary searches the second array to find the element at most dx below and at most dx above the element in the first array. This is linearithmic time:
left = np.searchsorted(b, a - dx, 'left')
right = np.searchsorted(b, a + dx, 'right')
a[left != right]
The linear algorithm has two pointers into the second array that keep track of a moving window as you iterate over elements in the first array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With