Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given Two Lists of Integers, Find Each Pair Within a Distance of Each Other < O(N^2)

I have two sorted lists of integers. I would like to find all pairs of integers from the first and second list, respectively, that are within a certain distance of each other.

The naive approach is to check each pair, resulting in a O(N^2) time. I am sure there is a way to do it in O(N*logN) or maybe shorter.

In python, the naive O(N^2) approach is as follows:

def find_items_within(list1, list2, within):
    for l1 in list1:
        for l2 in list2:
            if abs(l1 - l2) <= within:
                yield (l1, l2)

Extra points for pythonic answers.

Application Note

I just wanted to point out the purpose of this little puzzle. I am searching a document and want to find all the occurrences of one term within a certain distance of another term. First you find the term vectors of both terms, then you can use the algorithms described below to figure out if they are within a given distance of each other.

like image 520
speedplane Avatar asked Nov 29 '12 14:11

speedplane


2 Answers

There is no way to do it better then O(n^2) because there are O(n^2) pairs, and for within = infinity you need to yield all of them.


To find the number of these pairs is a different story, and can be done by finding the first index for each element e that suffices within-e < arr[idx]. The index idx can be found efficiently using binary search for example - which will get you O(nlogn) solution to find the number of these pairs.

It can also be done in linear time (O(n)), since you don't really need to do a binary search for all elements, after the first [a,b] range is found, note that for each other range [a',b'] - if a>a' then b>=b' - so you actually need to iterate the lists with two pointers and "never look back" to get a linear time complexity.

pseudo code: (for linear time solution)

numPairs <- 0
i <- 0
a <- 0
b <- 0
while (i < list1.length):
  while (a < i && list1[i] - list2[a] > within):
      a <- a+1
  while (b < list2.length && list2[b] - list1[i] < within):
      b <- b+1
  if (b > a):
      numPairs <- numPairs + (b-a)
  i <- i+1
return numPairs

(I made some fixes from the initial pseudo code - because the first one was aiming to find number of pairs within range in a single list - and not matches between two lists, sorry for that)

like image 73
amit Avatar answered Sep 25 '22 21:09

amit


This code is O(n*log(n)+m) where m is the size of the answer.

def find_items_within(l1, l2, dist):
    l1.sort()
    l2.sort()
    b = 0
    e = 0
    ans = []
    for a in l1:
        while b < len(l2) and a - l2[b] > dist:
            b += 1
        while e < len(l2) and l2[e] - a <= dist:
            e += 1
        ans.extend([(a,x) for x in l2[b:e]])
    return ans

In the worst case, it is possible that m = n*n, but if the answer is just a small subset of all possible pairs, this is a lot faster.

like image 23
Thomash Avatar answered Sep 23 '22 21:09

Thomash