Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to find two equal/closest values between two separate arrays?

Let's say we have two arrays of equal length:

arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 20)

Which variable from arr1 is equal / closest to a variable from arr2?

Looking at these two lists we can easily say that the closest numbers are 4.5 and 5. I've tried to implement a function that returns two closest values given two lists and it kinda works for the examples above, but it is barely a solution because it is not optimal. And you can easily check that the function fails when we slightly change the arrays like this:

arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 18)

the values the function returns are 13 and 18

Here is the function:

def get_nearest(arr1, arr2):
    lr = [[0, 0, 0]]
    for x1 in arr1:
        for x2 in arr2:
            r = (x1 / x2 % (x1 + x2))
            print x1, x2, r
            if r <= 1 and r >= lr[0][2]:
                lr.pop()
                lr.append([x1, x2, r])
    return lr

Can you come up with a better one?

like image 441
minerals Avatar asked Dec 02 '14 00:12

minerals


4 Answers

Is speed an issue? Do you care about ties? If not, what about something simple like

from itertools import product
sorted(product(arr1, arr2), key=lambda t: abs(t[0]-t[1]))[0]

For both

arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 20)

and

arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 18)

this yields

(5, 4.5)

Explanation:

product(arr1, arr2) = [(a1, a2) for (a1, a2) in product(arr1, arr2)]

yields a list of all N**2 pairs of numbers:

[(21, 10), (21, 4.5), ..., (13, 12), (13, 20)]

Then we sort them by the absolute difference (|a1 - a2|) using sorted. By passing sorted the key keyword, we tell sorted to use the sorting criteria lambda t: abs(t[0] - t[1]). The pair with the smallest absolute difference is placed in the first index of the sorted array, so we can grab it by tacking [0] on the end.

Edit:

As suggested by Piotr in the comments, you can feed a key=func to min and max, which speeds this up considerably. Try instead:

from itertools import product
min(product(arr1, arr2), key=lambda t: abs(t[0]-t[1]))[0]
like image 130
wflynny Avatar answered Nov 03 '22 01:11

wflynny


This is the fastest algorithm I was able to write, it has n*log(n) complexity which is much faster than naive n*n approach presented in other answers. It sorts arrays before processing ( it is the most time consuming part ) and later tries to minimize the difference (this takes 2*n in the worst case):

def closest_array_items(a1, a2):
    if not a1 or not a2:
        raise ValueError('Empty array')
    a1, a2  = iter(sorted(a1)), iter(sorted(a2))
    i1, i2 = a1.next(), a2.next()
    min_dif = float('inf')
    while 1:
        dif = abs(i1 - i2)
        if dif < min_dif:
             min_dif = dif
             pair = i1, i2
             if not min_dif:
                  break
        if i1 > i2:
            try:
                i2 = a2.next()
            except StopIteration:
                break
        else:
            try:
                i1 = a1.next()
            except StopIteration:
                break
    return pair
like image 36
Piotr Dabkowski Avatar answered Nov 03 '22 01:11

Piotr Dabkowski


>>> arr1 = (21, 2, 3, 5, 13)
>>> arr2 = (10, 4.5, 9, 12, 20)
>>> for a1 in arr1:
...     for a2 in arr2:
...         if a1 > a2:
...             result.append([a1, a2, a1-a2])
...         else:
...             result.append([a1, a2, a2-a1])
>>> sorted(result, key=lambda i:i[-1])[0][:2]
[5, 4.5]

A simple way could be get difference between both arrays and sort them by their difference and get the first element.

>>> sorted([[a1,a2,a1-a2] if(a1>a2) else [a1,a2,a2-a1] for a1 in arr1 for a2 in arr2], key=lambda i:i[-1])[0][:2]
[5, 4.5]
like image 23
Tanveer Alam Avatar answered Nov 03 '22 01:11

Tanveer Alam


What about simply saving the difference and the values in each iteration...

arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 20)

diff = float("inf")

for a1 in arr1:
    for a2 in arr2:
        if abs(a1-a2) < diff:
            diff = abs(a1-a2)
            values = (a1, a2)

print(values)
like image 43
chapelo Avatar answered Nov 03 '22 01:11

chapelo