Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify all overlapping tuples in list

Tags:

python

I've currently got a list of tuples (though I control creation of the list and tuples, so they could be altered in type if needed). Each tuple has a start and end integer and a string with an ID for that range's source. What I'm trying to do is identify all of the overlapping ranges within the tuples.

Currently I have

a = [(0, 98, '122:R'), 
     (100, 210, '124:R'),
     (180, 398, '125:R'),
     (200, 298, '123:R')]
highNum = 0
highNumItem = ''
for item in a:
    if item[0] < highNum:
        print(highNumItem + ' overlaps ' + item[2])
        if item[1] > highNum:
            highNum = item[1]
            highNumItem = item[2]

    
# 124:R overlaps 125:R
# 125:R overlaps 123:R

Which outputs enough information that overlaps should be able to be manually review and fixed. But, it misses identifying some sets of overlaps. I can't help thinking there's a relatively obvious solution I'm just missing or not using the right search terms to find examples of. But ideally I'd like the output to actually be

124:R overlaps 125:R & 123:R
125:R overlaps 123:R

But using my comparison method, I can't see a way to catch the rare instance where an overlap spans more than just 2 adjacent ranges. If anyone could point me to a function or comparison method appropriate to this, I'd greatly appreciate.

Also, if it matters, I'm currently stuck with python 2.7, but need to be able to port solution to 3.x when 3rd party applications allow it.

like image 937
John Avatar asked Aug 21 '20 19:08

John


2 Answers

Checks the higher number of a pair with the lower number of another pair:

a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]

for i, base_data in enumerate(a):
    for check_data in a[i + 1:]:
        if base_data[1] > check_data[0]:
            print(f"{base_data[2]} overlaps {check_data[2]}")
#prints:
#124:R overlaps 125:R
#124:R overlaps 123:R
#125:R overlaps 123:R

If you want it stored in groups:

from collections import defaultdict

a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]
d = defaultdict(list)

for i, base_data in enumerate(a):
    for check_data in a[i + 1:]:
        if base_data[1] > check_data[0]:
            d[base_data[2]].append(check_data[2])


print(d)
#prints:
#defaultdict(<class 'list'>, {'124:R': ['125:R', '123:R'], '125:R': ['123:R']})
#but can *easily* be iterated over to pretty print:
print("\n".join([f'{key} overlaps {" & ".join(d[key])}' for key in d]))
#prints:
#124:R overlaps 125:R & 123:R
#125:R overlaps 123:R

The dictionary is far better than printing as it actually stores the data instead of just printing it. Printing the data can also be more controlled. Also, using a defaultdict over a dict makes the code more compact.

like image 156
Elan-R Avatar answered Sep 27 '22 00:09

Elan-R


Here is an example using intspan to calculate the overlaps. (Using Python 3.8)

from itertools import combinations
from intspan import intspan

a = [(0, 98, '122:R'), (100, 210, '124:R'), (180, 398, '125:R'), (200, 298, '123:R')]

d = {}
for one, two in combinations(a,2):
    # if the 2 ranges intersect
    if intspan.from_range(*one[0:2]) & intspan.from_range(*two[0:2]):
        d.setdefault(one[2], []).append(two[2])

for key, v in d.items():
    print(key + ',' + ','.join(v))

Prints:

124:R,125:R,123:R
125:R,123:R
like image 22
Chris Charley Avatar answered Sep 25 '22 00:09

Chris Charley