I have two arrays <code>A</code> and <code>B</code>. Let them both be one-dimensional for now. For each element in <code>A</code> I need the index of the element in <code>B</code> that best matches the element in <code>A</code>. I can solve this using a list expression <pre class="prettyprint lang-py prettyprint-override"><code>import numpy as np A = np.array([ 1, 3, 1, 5 ]) B = np.array([ 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ]) indices = np.array([ np.argmin(np.abs(B-a)) for a in A ]) print(indices) # prints [0 2 0 4] print(B[indices]) # prints [1.1 3.1 1.1 5.1] </code></pre> but this method is really slow for huge arrays. I am wondering if there is a faster way utilizing optimized numpy functions.

You can compute the absolute difference between B and reshaped A, and use <code>argmin</code> on <code>axis=1</code>: <pre class="prettyprint"><code>np.argmin(np.abs(B-A[:,None]), axis=1) </code></pre> output: <code>array([0, 2, 0, 4])</code>

Find indices for elements in array B best matching those in array A

Tags:

performance

python

arrays

indexing

numpy

I have two arrays A and B. Let them both be one-dimensional for now.
For each element in A I need the index of the element in B that best matches the element in A.

I can solve this using a list expression

import numpy as np

A = np.array([ 1, 3, 1, 5 ])
B = np.array([ 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ])

indices = np.array([ np.argmin(np.abs(B-a)) for a in A ])

print(indices)    # prints [0 2 0 4]
print(B[indices]) # prints [1.1 3.1 1.1 5.1]

but this method is really slow for huge arrays.
I am wondering if there is a faster way utilizing optimized numpy functions.

317

asked Sep 06 '21 13:09

Bastian

2 Answers

You can compute the absolute difference between B and reshaped A, and use argmin on axis=1:

np.argmin(np.abs(B-A[:,None]), axis=1)

output: array([0, 2, 0, 4])

189

answered Sep 28 '22 14:09

mozway

Broadcasting can come to bite you back(tmp array creation will be included in time also), below method does not use lot of tmp memory so is memory efficient. Here is reference when broadcasting slow down due to too much memory usage

Keeping this for reference here. Other than that, you can write custom functions in cython numpy. Cython use different optimization compared to numba. So there is need to experiment which one optimize better. But for numba you can stay in python and write c like code

import numpy as np
import numba as nb

A = np.array([ 1, 3, 1, 5 ], dtype=np.float64)
B = np.array([ 1.1, 2.1, 3.1, 4.1, 5.1, 6.1 ], dtype=np.float64)

# Convert to fast optimized machine code
@nb.njit(
    # Signature of Input
    (nb.float64[:], nb.float64[:]),
    # Optional
    parallel=True
)
def less_mem_ver(A, B):

    arg_mins = np.empty(A.shape, dtype=np.int64)

    # nb.prange is for parallel=True
    # what can be parallelized
    # Usage of for loop because, to prevent creation of tmp arrays due to broadcasting
    # It takes time to allocate tmp array
    # No loss in writing for loop as numba will vectorize this just like numpy
    for i in nb.prange(A.shape[0]):
        min_num = 1e+307
        min_index = -1
        for j in range(B.shape[0]):
            t = np.abs(A[i] - B[j])
            if t < min_num:
                min_index = j
                min_num = t
        arg_mins[i] = min_index
    return arg_mins
less_mem_ver(A, B)

answered Sep 28 '22 15:09

eroot163pi

Related questions
                            
                                You have missing dependencies! # Mandatory: spyder_kernels >=2.0.1,<2.1.0 : 2.0.1 (NOK) [duplicate]
                            
                                Apply heatmap on video with OpenCV and Python
                            
                                Why does this print statement using a Python f-string output double parentheses?
                            
                                Replacing values in a data.frame according to a value from an other data.frame with the same shape (Python)
                            
                                Pandas explode dictionary to rows
                            
                                Cannot use keras models on Mac M1 with BigSur
                            
                                Permission denied with pip install --user -e /home/me/package/
                            
                                Convert result from groupby on multiple columns to list of dictionaries
                            
                                How do I tell sympy that i^2 = -1?
                            
                                Multiple Hidden Imports in Pyinstaller
                            
                                Pandas group by and sum, but create a new row when a certain amount is exceeded
                            
                                Building wheel for cffi (setup.py) ... error while installing the packages from requirements.txt in django
                            
                                Looping tasks in Prefect
                            
                                pip install py-find-1st fails on ubuntu20 & centos with python3.9
                            
                                Create hierarchy column in pandas
                            
                                `multiprocessing.Process` are modifying non-shared variables they should not have access to
                            
                                Troubles with downloading and saving a document in django
                            
                                Explain to me what the big deal with tail call optimization is and why Python needs it
                            
                                How to determine which points are inside of a polygon and which are not (large number of points)?
                            
                                Retry on deadlock for MySQL / SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With