I am trying to implement the Waterman-Eggert algorithm for finding sub-optimal local sequence alignments, but am struggling to understand how to "declump" separate groups of alignments. I have the basic Smith-Waterman algorithm working fine.
A simple test aligning the following sequence against itself:
'HEAGHEAGHEAG'
'HEAGHEAGHEAG'
produces an fMatrix as follows:
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 8. 0. 0. 0. 8. 0. 0. 0. 8. 0. 0. 0.]
[ 0. 0. 13. 0. 0. 0. 13. 0. 0. 0. 13. 0. 0.]
[ 0. 0. 0. 17. 0. 0. 0. 17. 0. 0. 0. 17. 0.]
[ 0. 0. 0. 0. 23. 0. 0. 0. 23. 0. 0. 0. 23.]
[ 0. 8. 0. 0. 0. 31. 0. 0. 0. 31. 0. 0. 0.]
[ 0. 0. 13. 0. 0. 0. 36. 0. 0. 0. 36. 0. 0.]
[ 0. 0. 0. 17. 0. 0. 0. 40. 0. 0. 0. 40. 0.]
[ 0. 0. 0. 0. 23. 0. 0. 0. 46. 0. 0. 0. 46.]
[ 0. 8. 0. 0. 0. 31. 0. 0. 0. 54. 4. 0. 0.]
[ 0. 0. 13. 0. 0. 0. 36. 0. 0. 4. 59. 9. 0.]
[ 0. 0. 0. 17. 0. 0. 0. 40. 0. 0. 9. 63. 13.]
[ 0. 0. 0. 0. 23. 0. 0. 0. 46. 0. 0. 13. 69.]]
In order to find the sub-optimal alignments, e.g.
'HEAGHEAGHEAG '
' HEAGHEAGHEAG'
you have to first remove the optimal alignment (i.e. along the main diagonal) and recalculate the fMatrix; this is known as "declumping", where a "clump" of alignments is defined as any alignments whose paths intersect/share one or more pairs of aligned residues. In addition to the fMatrix, there is a secondary matrix that contains information about the direction in which fMatrix was constructed.
A snippet of the code to build the fMatrix and backtracking matrix is as follows:
# Generates fMatrix.
for i in range(1, length):
for j in range(1, length):
matchScore = fMatrix[i-1][j-1] + simMatrixDict[seq[i-1]+seq[j-1]]
insScore = fMatrix[i][j-1] + gap
delScore = fMatrix[i-1][j] + gap
fMatrix[i][j] = max(0, matchScore, insScore, delScore)
# Generates matrix for backtracking.
if fMatrix[i][j] == matchScore:
backMatrix[i][j] = 2
elif fMatrix[i][j] == insScore:
backMatrix[i][j] = 3 # INSERTION in seq - Horizontal
elif fMatrix[i][j] == delScore:
backMatrix[i][j] = 1 # DELETION in seq - Vertical
if fMatrix[i][j] >= backtrackStart:
backtrackStart = fMatrix[i][j]
endCoords = i, j
return fMatrix, backMatrix, endCoords
In order to remove this optimal alignment I have tried using this backMatrix to backtrack through the fMatrix (as per original Smith-Waterman algorithm) and set fMatrix[i][j] = 0
as I go, but this doesn't remove the entire clump, only the exact alignment in that clump.
For some background information, the Wikipedia page for the Smith-Waterman algorithm explains how fMatrix is constructed and there is an explanation on here about how backtracking works. The Waterman-Eggert algorithm is roughly explained here.
Thanks.
OK. Here's some code to do what you want. I've used the pretty print library (pprint
) so that the output looks nice. (It looks nicest if the numbers in the matrix are single digit, but the alignment gets a little messed up if there are multi-digit numbers.)
Because you only need to change numbers on the main diagonal and and those on the diagonals above and below, we just need one for loop. matrix[i][i]
is always on the main diagonal, because it is i
rows down, and i
columns across. matrix[i][i-1]
is always the lower adjacent diagonal, because it is i
rows down, and i-1
columns across. matrix[i-1][i]
is always the upper adjacent diagonal, because it is i-1
rows down, and i
rows across.
#!/usr/bin/python
import pprint
matrix = [
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,],
[ 0, 8, 0, 0, 0, 8, 0, 0, 0, 8, 0, 0, 0,],
[ 0, 0, 13, 0, 0, 0, 13, 0, 0, 0, 13, 0, 0,],
[ 0, 0, 0, 17, 0, 0, 0, 17, 0, 0, 0, 17, 0,],
[ 0, 0, 0, 0, 23, 0, 0, 0, 23, 0, 0, 0, 23,],
[ 0, 8, 0, 0, 0, 31, 0, 0, 0, 31, 0, 0, 0,],
[ 0, 0, 13, 0, 0, 0, 36, 0, 0, 0, 36, 0, 0,],
[ 0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 0, 40, 0,],
[ 0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 0, 46,],
[ 0, 8, 0, 0, 0, 31, 0, 0, 0, 54, 4, 0, 0,],
[ 0, 0, 13, 0, 0, 0, 36, 0, 0, 4, 59, 9, 0,],
[ 0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 9, 63, 13,],
[ 0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 13, 69,]]
print "Original Matrix"
pprint.pprint(matrix)
print
for i in range(len(matrix)):
matrix[i][i] = 0
if (i > 0) and (i < (len(matrix))):
matrix[i][i-1] = 0
matrix[i-1][i] = 0
print "New Matrix"
pprint.pprint(matrix)
Original Matrix
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 8, 0, 0, 0, 8, 0, 0, 0, 8, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 13, 0, 0, 0, 13, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 17, 0, 0, 0, 17, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 23, 0, 0, 0, 23],
[0, 8, 0, 0, 0, 31, 0, 0, 0, 31, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 36, 0, 0, 0, 36, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 0, 40, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 0, 46],
[0, 8, 0, 0, 0, 31, 0, 0, 0, 54, 4, 0, 0],
[0, 0, 13, 0, 0, 0, 36, 0, 0, 4, 59, 9, 0],
[0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 9, 63, 13],
[0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 13, 69]]
New Matrix
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 8, 0, 0, 0, 8, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 13, 0, 0, 0, 13, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 17, 0, 0, 0, 17, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 23],
[0, 8, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 36, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 0, 0, 0, 0, 40, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 46],
[0, 8, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 13, 0, 0, 0, 36, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 17, 0, 0, 0, 40, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 23, 0, 0, 0, 46, 0, 0, 0, 0]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With