I've currently got a list of tuples (though I control creation of the list and tuples, so they could be altered in type if needed). Each tuple has a start and end integer and a string with an ID for that range's source. What I'm trying to do is identify all of the overlapping ranges within the tuples.
Currently I have
a = [(0, 98, '122:R'),
(100, 210, '124:R'),
(180, 398, '125:R'),
(200, 298, '123:R')]
highNum = 0
highNumItem = ''
for item in a:
if item[0] < highNum:
print(highNumItem + ' overlaps ' + item[2])
if item[1] > highNum:
highNum = item[1]
highNumItem = item[2]
# 124:R overlaps 125:R
# 125:R overlaps 123:R
Which outputs enough information that overlaps should be able to be manually review and fixed. But, it misses identifying some sets of overlaps. I can't help thinking there's a relatively obvious solution I'm just missing or not using the right search terms to find examples of. But ideally I'd like the output to actually be
124:R overlaps 125:R & 123:R
125:R overlaps 123:R
But using my comparison method, I can't see a way to catch the rare instance where an overlap spans more than just 2 adjacent ranges. If anyone could point me to a function or comparison method appropriate to this, I'd greatly appreciate.
Also, if it matters, I'm currently stuck with python 2.7, but need to be able to port solution to 3.x when 3rd party applications allow it.
Checks the higher number of a pair with the lower number of another pair:
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
for i, base_data in enumerate(a):
for check_data in a[i + 1:]:
if base_data[1] > check_data[0]:
print(f"{base_data[2]} overlaps {check_data[2]}")
#prints:
#124:R overlaps 125:R
#124:R overlaps 123:R
#125:R overlaps 123:R
If you want it stored in groups:
from collections import defaultdict
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
d = defaultdict(list)
for i, base_data in enumerate(a):
for check_data in a[i + 1:]:
if base_data[1] > check_data[0]:
d[base_data[2]].append(check_data[2])
print(d)
#prints:
#defaultdict(<class 'list'>, {'124:R': ['125:R', '123:R'], '125:R': ['123:R']})
#but can *easily* be iterated over to pretty print:
print("\n".join([f'{key} overlaps {" & ".join(d[key])}' for key in d]))
#prints:
#124:R overlaps 125:R & 123:R
#125:R overlaps 123:R
The dictionary is far better than printing as it actually stores the data instead of just printing it. Printing the data can also be more controlled. Also, using a defaultdict
over a dict
makes the code more compact.
Here is an example using intspan to calculate the overlaps. (Using Python 3.8)
from itertools import combinations
from intspan import intspan
a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
d = {}
for one, two in combinations(a,2):
# if the 2 ranges intersect
if intspan.from_range(*one[0:2]) & intspan.from_range(*two[0:2]):
d.setdefault(one[2], []).append(two[2])
for key, v in d.items():
print(key + ',' + ','.join(v))
Prints:
124:R,125:R,123:R
125:R,123:R
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With