I have two sorted lists of integers. I would like to find all pairs of integers from the first and second list, respectively, that are within a certain distance of each other.
The naive approach is to check each pair, resulting in a O(N^2) time. I am sure there is a way to do it in O(N*logN) or maybe shorter.
In python, the naive O(N^2) approach is as follows:
def find_items_within(list1, list2, within):
for l1 in list1:
for l2 in list2:
if abs(l1 - l2) <= within:
yield (l1, l2)
Extra points for pythonic answers.
Application Note
I just wanted to point out the purpose of this little puzzle. I am searching a document and want to find all the occurrences of one term within a certain distance of another term. First you find the term vectors of both terms, then you can use the algorithms described below to figure out if they are within a given distance of each other.
There is no way to do it better then O(n^2)
because there are O(n^2)
pairs, and for within = infinity
you need to yield all of them.
To find the number of these pairs is a different story, and can be done by finding the first index for each element e
that suffices within-e < arr[idx]
. The index idx
can be found efficiently using binary search for example - which will get you O(nlogn)
solution to find the number of these pairs.
It can also be done in linear time (O(n)
), since you don't really need to do a binary search for all elements, after the first [a,b]
range is found, note that for each other range [a',b']
- if a>a'
then b>=b'
- so you actually need to iterate the lists with two pointers and "never look back" to get a linear time complexity.
pseudo code: (for linear time solution)
numPairs <- 0
i <- 0
a <- 0
b <- 0
while (i < list1.length):
while (a < i && list1[i] - list2[a] > within):
a <- a+1
while (b < list2.length && list2[b] - list1[i] < within):
b <- b+1
if (b > a):
numPairs <- numPairs + (b-a)
i <- i+1
return numPairs
(I made some fixes from the initial pseudo code - because the first one was aiming to find number of pairs within range in a single list - and not matches between two lists, sorry for that)
This code is O(n*log(n)+m) where m is the size of the answer.
def find_items_within(l1, l2, dist):
l1.sort()
l2.sort()
b = 0
e = 0
ans = []
for a in l1:
while b < len(l2) and a - l2[b] > dist:
b += 1
while e < len(l2) and l2[e] - a <= dist:
e += 1
ans.extend([(a,x) for x in l2[b:e]])
return ans
In the worst case, it is possible that m = n*n
, but if the answer is just a small subset of all possible pairs, this is a lot faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With