I am new in Python, and have already spend to many hours with this problem, hope somebody can help me. I need to find the overlap between 2 sequences. The overlap is in the left end of the first sequences and the right end of the second one. I want the function to find the overlap, and return it.
My sequences are:
s1 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
s2 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
My function should be named
def getOverlap(left, right)
With s1
being the left sequence, and the s2
being the right one.
The outcome should be
‘GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC’
Any help is appreciated
Have a look at the difflib
library and more precisely at find_longest_match()
:
import difflib
def get_overlap(s1, s2):
s = difflib.SequenceMatcher(None, s1, s2)
pos_a, pos_b, size = s.find_longest_match(0, len(s1), 0, len(s2))
return s1[pos_a:pos_a+size]
s1 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
s2 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
print(get_overlap(s1, s2)) # GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
You could use difflib.SequenceMatcher
:
d = difflib.SequenceMatcher(None,s1,s2)
>>> match = max(d.get_matching_blocks(),key=lambda x:x[2])
>>> match
Match(a=8, b=0, size=39)
>>> i,j,k = match
>>> d.a[i:i+k]
'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC'
>>> d.a[i:i+k] == d.b[j:j+k]
True
>>> d.a == s1
True
>>> d.b == s2
True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With