I am looking for a efficient and fast way to do the following in Python 3.x. I am open to using third party libraries such as Numpy as long as the performance is there.
I have a list of ranges containing hundreds of thousands of entries. They're not actually range()'s, but rather the boundary numbers, such as:
list_a = [(1, 100), (300, 550), (551, 1999)]
Then, I iterate over hundred of thousands of other ranges (boundary numbers). I want to find if they contain one of the existing ranges from above. For example:
(0, 600) contains list_a[0] and list_a[1]
(550, 2000) contains list_a[2]
(2000, 2200) does not contain an existing range
Right now, doing something similar to the following, which is way too slow for big amounts of data:
for start, end in get_next_range():
for r in list_a:
if r[0] >= start and r[1] <= end:
# do something
else:
# do something else
Any help would be much appreciated!
I would do it following way using numpy
:
import numpy as np
start = 0
finish = 600
lista = np.array([[1,100],[300,550],[551,1999]])
S = lista[:,0]>start
F = lista[:,1]<finish
contains = np.logical_and(S,F)
ind = list(np.flatnonzero(contains))
print(ind) #print [0, 1]
Explanation: firstly I made lista
as np.array
, then sliced it into two parts: one with lower bound ([:,0]
) and second for upper bound ([:,1]
) then used comparison operators, getting 1D np.array
s of bool
s. Using np.logical_an
d I got single 1D np.array
with True
s for fullfiling condition and False
s for rest. Finally I used np.flatnonzero
to get indices of True
s. This solution assumes that all data are in (lowerboundary,upperboundary)
order. Please check if that solution is fast enough for your purpose.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With