If I have two strings of equal length like the following:
'aaaaabbbbbccccc'
'bbbebcccccddddd'
Is there an efficient way to align the two such that the most letters as possible line up as shown below?
'aaaaabbbbbccccc-----'
'-----bbbebcccccddddd'
The only way I can think of doing this is brute force by editing the strings and then iterating through and comparing.
Return the index which gives the maximum score, where the maximum score is the strings which have the most matching characters.
def best_overlap(a, b):
return max([(score(a[offset:], b), offset) for offset in xrange(len(a))], key=lambda x: x[0])[1]
def score(a, b):
return sum([a[i] == b[i] for i in xrange(len(a))])
>>> best_overlap(a, b)
5
>>> a + '-' * best_overlap(a, b); '-' * best_overlap(a, b) + b
'aaaaabbbbbccccc-----'
'-----bbbebcccccddddd'
Or, equivalently:
def best_match(a, b):
max = 0
max_score = 0
for offset in xrange(len(a)):
val = score(a[offset:], b)
if val > max_score:
max_score = val
max = offset
return max
There is room for optimizations such as:
Early exit for no matching characters
Early exit when maximum possible match found
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With