I have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum hamming distance of all sequences in the list. I implemented a naive brute force algorithm, which has been running for more than a day and has not yet given a solution. My code is
dmin=len(trans[0])
for i in xrange(len(trans)):
for j in xrange(i+1,len(trans)):
dist=hamdist(trans[i][:-1], trans[j][:-1])
if dist < dmin:
dmin = dist
Is there a more efficient method to do this? Here hamdist is a function I wrote to find hamming distances. It is
def hamdist(str1, str2):
diffs = 0
if len(str1) != len(str2):
return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
if ch1 != ch2:
diffs += 1
return diffs
In order to calculate the Hamming distance between two strings, and , we perform their XOR operation, (a⊕ b), and then count the total number of 1s in the resultant string.
Example: Lets say i have 2 strings of equal length "ABCD" and "ACCC". In this example first and second last are same in both string so you need to substitute 2 character in 2nd to make it same hence, hamming distance is 2 in this case. so total hamming distance is 1 + 1 = 2.
The Hamming distance d(10101, 11110) is 3 because 10101 ⊕ 11110 is 01011 (three 1s).
You could optimize your hamdist
function by adding an optional parameter containing the minimum distance you have got so far, this way if diffs
reaches that value you stop calculating the distance because this comparison will give you a greater distance than the minimum:
def hamdist(str1, str2,prevMin=None):
diffs = 0
if len(str1) != len(str2):
return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
if ch1 != ch2:
diffs += 1
if prevMin is not None and diffs>prevMin:
return None
return diffs
You will need to adapt your main loop to work with None
return value from hamdist
:
dmin=len(trans[0])
for i in xrange(len(trans)):
for j in xrange(i+1,len(trans)):
dist=hamdist(trans[i][:-1], trans[j][:-1])
if dist is not None and dist < dmin:
dmin = dist
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With