I am very new with python and am struggling in creating a matrix that expresses the longest common substring. I am looking for a result like this: LCS matrix
This is my code so far.
def compute_lcs(X, Y):
m = len(X)
n = len(Y)
# An (m) times (n) matrix
matrix = [[0] * (n) for _ in range(m)]
for i in range(1, m):
for j in range(1, n):
if X[i] == Y[j]:
if i == 0 or j == 0:
matrix[i][j] = 1
else:
matrix[i][j] = matrix[i-1][j-1]+1
else:
matrix[i][j] = 0
return matrix
b = compute_lcs('AACTGGCAG','TACGCTGGA')
for y in b:
print (y)
Current Output:
[0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 1, 1, 1, 1, 1, 1, 0]
[0, 1, 0, 2, 0, 2, 2, 2, 0]
[0, 1, 2, 1, 3, 0, 3, 3, 0]
[0, 1, 2, 0, 2, 4, 0, 0, 0]
[0, 1, 2, 0, 1, 3, 0, 0, 0]
[0, 1, 0, 3, 0, 2, 4, 1, 0]
[0, 0, 2, 1, 4, 1, 3, 5, 0]
[0, 1, 1, 0, 2, 5, 0, 0, 0]
Expected Output:
[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, 1, 0, 0, 0, 2, 0, 0]
[0, 0, 0, 2, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 3, 1, 0, 0, 0]
[0, 0, 0, 0, 1, 4, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
However my result is a matrix that shows incorrect values. When I do the matrix by hand, the correct output looks like this: Correct output. I feel like my logic makes sense, what am I doing incorrectly?
Thanks everyone.
First off, to make things clear, the longest common subsequence problem is not the same as the longest common substring problem. What you're trying to solve is the latter; better not to confuse the two.
Secondly, your else branches are not aligned under the appropriate if conditions.
Whenever the strings match X[i] == Y[j], we set the matrix element to 1 if the index i or j is 0 since i-1 or j-1 at 0 gives -1 (unfortunately, this is also the index for last item in Python) which is not what we want, otherwise we increment for higher indices i, j > 1.
Thirdly, your looping should start at 0 not 1 since we start from the first characters in the strings which are at indices 0:
def compute_lcs(X, Y):
m = len(X)
n = len(Y)
# An (m) times (n) matrix
matrix = [[0] * n for _ in range(m)]
for i in range(0, m):
for j in range(0, n):
if X[i] == Y[j]:
if i == 0 or j == 0:
matrix[i][j] = 1
else:
matrix[i][j] = matrix[i-1][j-1]+1
else:
matrix[i][j] = 0
return matrix
To get the exact matrix shown in the expected output, you should swap the order of the arguments or transpose the matrix before printing. Note however that these are not necessary (swapping or transposing) and only serve for formatting purposes:
b = compute_lcs('TACGCTGGA', 'AACTGGCAG')
for y in b:
print (y)
[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, 1, 0, 0, 0, 2, 0, 0]
[0, 0, 0, 2, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 3, 1, 0, 0, 1]
[0, 0, 0, 0, 1, 4, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With