Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find indices for elements in array B best matching those in array A

I have two arrays A and B. Let them both be one-dimensional for now.
For each element in A I need the index of the element in B that best matches the element in A.

I can solve this using a list expression

import numpy as np

A = np.array([ 1, 3, 1, 5 ])
B = np.array([ 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ])

indices = np.array([ np.argmin(np.abs(B-a)) for a in A ])

print(indices)    # prints [0 2 0 4]
print(B[indices]) # prints [1.1 3.1 1.1 5.1]

but this method is really slow for huge arrays.
I am wondering if there is a faster way utilizing optimized numpy functions.

like image 317
Bastian Avatar asked Sep 06 '21 13:09

Bastian


People also ask

How do you find array indices?

To find the position of an element in an array, you use the indexOf() method. This method returns the index of the first occurrence the element that you want to find, or -1 if the element is not found. The following illustrates the syntax of the indexOf() method.

How do I find an element in an array of arrays?

If you need the index of the found element in the array, use findIndex() . If you need to find the index of a value, use Array.prototype.indexOf() . (It's similar to findIndex() , but checks each element for equality with the value instead of using a testing function.)

How do you access the elements of an array using an index?

Access Array Elements Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.


2 Answers

You can compute the absolute difference between B and reshaped A, and use argmin on axis=1:

np.argmin(np.abs(B-A[:,None]), axis=1)

output: array([0, 2, 0, 4])

like image 189
mozway Avatar answered Sep 28 '22 14:09

mozway


Broadcasting can come to bite you back(tmp array creation will be included in time also), below method does not use lot of tmp memory so is memory efficient. Here is reference when broadcasting slow down due to too much memory usage

Keeping this for reference here. Other than that, you can write custom functions in cython numpy. Cython use different optimization compared to numba. So there is need to experiment which one optimize better. But for numba you can stay in python and write c like code

import numpy as np
import numba as nb

A = np.array([ 1, 3, 1, 5 ], dtype=np.float64)
B = np.array([ 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ], dtype=np.float64)

# Convert to fast optimized machine code
@nb.njit(
    # Signature of Input
    (nb.float64[:], nb.float64[:]),
    # Optional
    parallel=True
)
def less_mem_ver(A, B):

    arg_mins = np.empty(A.shape, dtype=np.int64)

    # nb.prange is for parallel=True
    # what can be parallelized
    # Usage of for loop because, to prevent creation of tmp arrays due to broadcasting
    # It takes time to allocate tmp array
    # No loss in writing for loop as numba will vectorize this just like numpy
    for i in nb.prange(A.shape[0]):
        min_num = 1e+307
        min_index = -1
        for j in range(B.shape[0]):
            t = np.abs(A[i] - B[j])
            if t < min_num:
                min_index = j
                min_num = t
        arg_mins[i] = min_index
    return arg_mins
less_mem_ver(A, B)

like image 43
eroot163pi Avatar answered Sep 28 '22 15:09

eroot163pi