Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, find if a range contains another smaller range from a list of ranges

I am looking for a efficient and fast way to do the following in Python 3.x. I am open to using third party libraries such as Numpy as long as the performance is there.

I have a list of ranges containing hundreds of thousands of entries. They're not actually range()'s, but rather the boundary numbers, such as:

list_a = [(1, 100), (300, 550), (551, 1999)]

Then, I iterate over hundred of thousands of other ranges (boundary numbers). I want to find if they contain one of the existing ranges from above. For example:

(0, 600) contains list_a[0] and list_a[1]
(550, 2000) contains list_a[2]
(2000, 2200) does not contain an existing range

Right now, doing something similar to the following, which is way too slow for big amounts of data:

for start, end in get_next_range():
    for r in list_a:
        if r[0] >= start and r[1] <= end:
            # do something
        else:
            # do something else

Any help would be much appreciated!

like image 740
jrm Avatar asked Nov 07 '22 21:11

jrm


1 Answers

I would do it following way using numpy:

import numpy as np
start = 0
finish = 600
lista = np.array([[1,100],[300,550],[551,1999]])
S = lista[:,0]>start
F = lista[:,1]<finish
contains = np.logical_and(S,F)
ind = list(np.flatnonzero(contains))
print(ind) #print [0, 1]

Explanation: firstly I made lista as np.array, then sliced it into two parts: one with lower bound ([:,0]) and second for upper bound ([:,1]) then used comparison operators, getting 1D np.arrays of bools. Using np.logical_and I got single 1D np.array with Trues for fullfiling condition and Falses for rest. Finally I used np.flatnonzero to get indices of Trues. This solution assumes that all data are in (lowerboundary,upperboundary) order. Please check if that solution is fast enough for your purpose.

like image 62
Daweo Avatar answered Nov 14 '22 21:11

Daweo