Efficient algorithm to get all the combinations of numbers that are within a certain range from 2 lists in python

Video Answer

2 Answers

Here is an implementation of the idea of Marat from the comments:

import bisect

def close_pairs(list1,list2,c):
  #assumes that list2 is sorted
  for x in list1:
    i = bisect.bisect_left(list2,x-c)
    j = bisect.bisect_right(list2,x+c)
    yield from ((x,y) for y in list2[i:j])

list_1 = [1, 5, 10]
list_2 = [3, 4, 15]
print(list(close_pairs(list_1,list_2,2)))
#prints [(1, 3), (5, 3), (5, 4)]

To demonstrate the potential improvement of this strategy over what might be thought of as the "naive" approach, let's timeit.

import timeit

setup_naive = '''
import numpy
list_a = numpy.random.randint(0, 2500, 500).tolist()
list_b = numpy.random.randint(0, 2500, 500).tolist()
c = 2
def close_pairs(list_a, list_b, c):
    yield from ((x,y) for x in list_a for y in list_b if abs(x-y) <= c)
'''

setup_john_coleman = '''
import bisect
import numpy
list_a = numpy.random.randint(0, 2500, 500).tolist()
list_b = numpy.random.randint(0, 2500, 500).tolist()
c = 2
def close_pairs(list_a, list_b, c):
    list_a = sorted(list_a)
    list_b = sorted(list_b)
    for x in list_a:
        i = bisect.bisect_left(list_b,x-c)
        j = bisect.bisect_right(list_b,x+c)
        yield from ((x,y) for y in list_b[i:j])
'''

print(f"john_coleman: {timeit.timeit('list(close_pairs(list_a, list_b, c))', setup=setup_john_coleman, number=1000):.2f}")
print(f"naive: {timeit.timeit('list(close_pairs(list_a, list_b, c))', setup=setup_naive, number=1000):.2f}")

On a handy laptop that gives result like:

john_coleman: 0.50
naive: 18.35

answered Oct 23 '22 17:10

John Coleman

If the lists are sorted as your example suggests, then remove the sorting and then this has runtime complexity O(M+N+P) where M and N are the list sizes and P is the number of close pairs. It keeps an index i so that ys[i] is the smallest y-value not too small, and then walks over ys[i:...] as long as they're not too large, yielding each pair.

def close_pairs(xs, ys, c):
    xs = sorted(xs)
    ys = sorted(ys) + [float('inf')]
    i = 0
    for x in xs:
        while x - ys[i] > c:
            i += 1
        j = i
        while ys[j] - x <= c:
            yield x, ys[j]
            j += 1

Benchmark results with lists/ranges 1000 times larger than your example:

 904.4 ms  close_pairs_naive
   4.9 ms  close_pairs_John_Coleman
   1.8 ms  close_pairs_Kelly_Bundy

Benchmark code:

from timeit import timeit
import random
import bisect
from collections import deque

def close_pairs_naive(list_a, list_b, c):
    yield from ((x,y) for x in list_a for y in list_b if abs(x-y) <= c)

def close_pairs_John_Coleman(list_a, list_b, c):
    list_a = sorted(list_a)
    list_b = sorted(list_b)
    for x in list_a:
        i = bisect.bisect_left(list_b,x-c)
        j = bisect.bisect_right(list_b,x+c)
        yield from ((x,y) for y in list_b[i:j])

def close_pairs_Kelly_Bundy(xs, ys, c):
    xs = sorted(xs)
    ys = sorted(ys) + [float('inf')]
    i = 0
    for x in xs:
        while x - ys[i] > c:
            i += 1
        j = i
        while ys[j] - x <= c:
            yield x, ys[j]
            j += 1

funcs = [
    close_pairs_naive,
    close_pairs_John_Coleman,
    close_pairs_Kelly_Bundy,
]

xs = random.choices(range(15000), k=3000)
ys = random.choices(range(15000), k=3000)
c = 2
args = xs, ys, c

expect = sorted(funcs[0](*args))
for func in funcs:
    result = sorted(func(*args))
    print(result == expect, func.__name__, len(result))
print()

for _ in range(3):
    for func in funcs:
        t = timeit(lambda: deque(func(*args), 0), number=1)
        print('%6.1f ms ' % (t * 1e3), func.__name__)
    print()

answered Oct 23 '22 18:10

Kelly Bundy

Related questions
                            
                                Using mypy with with lazy initialization of instance attributes
                            
                                Why does creating a list of tuples using list comprehension requires parentheses?
                            
                                SQLAlchemy engine from Airflow database hook
                            
                                Python enum meta making typing module crash
                            
                                RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation?
                            
                                How to generate a model for a mongoengine Document from an already existing collection
                            
                                Efficiently finding consecutive streaks in a pandas DataFrame column?
                            
                                Recursive Operation in Pandas
                            
                                Python3.9 malloc: can't allocate region error 3
                            
                                Understanding WeightedKappaLoss using Keras
                            
                                Using python-coveralls from github-actions returns "Could not submit coverage: 422 Client Error"
                            
                                Gradient Accumulation with Custom model.fit in TF.Keras?
                            
                                What is wrong with the syntax of this simple Python list?
                            
                                Django 3.2 AttributeError: 'TextField' object has no attribute 'db_collation'
                            
                                Trouble scraping all the books from a section without hardcoding payload
                            
                                Avoiding accidental capture in structural pattern matching
                            
                                How to add a mean and median line to a Seaborn displot
                            
                                Pip "Ignoring invalid distribution" warning [duplicate]
                            
                                Cannot import to_categorical from keras in Google Colab
                            
                                Efficiency of sorting by multiple keys in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient algorithm to get all the combinations of numbers that are within a certain range from 2 lists in python

Tags:

python

algorithm

python-3.x

knowledge_seeker

People also ask

Video Answer

2 Answers

John Coleman

Kelly Bundy

Recent Activity

Donate For Us