Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest common substring matrix

I am very new with python and am struggling in creating a matrix that expresses the longest common substring. I am looking for a result like this: LCS matrix

This is my code so far.

def compute_lcs(X, Y):
    m = len(X)
    n = len(Y)
# An (m) times (n) matrix
    matrix = [[0] * (n) for _ in range(m)]
    for i in range(1, m):
        for j in range(1, n):
            if X[i] == Y[j]: 
                if i == 0 or j == 0:
                    matrix[i][j] = 1
            else:
                matrix[i][j] = matrix[i-1][j-1]+1
        else:
            matrix[i][j] = 0
    return matrix

b = compute_lcs('AACTGGCAG','TACGCTGGA')
for y in b:
    print (y)

Current Output:
[0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 1, 1, 1, 1, 1, 1, 0]
[0, 1, 0, 2, 0, 2, 2, 2, 0]
[0, 1, 2, 1, 3, 0, 3, 3, 0]
[0, 1, 2, 0, 2, 4, 0, 0, 0]
[0, 1, 2, 0, 1, 3, 0, 0, 0]
[0, 1, 0, 3, 0, 2, 4, 1, 0]
[0, 0, 2, 1, 4, 1, 3, 5, 0]
[0, 1, 1, 0, 2, 5, 0, 0, 0]

Expected Output:
[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, 1, 0, 0, 0, 2, 0, 0]
[0, 0, 0, 2, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 3, 1, 0, 0, 0]
[0, 0, 0, 0, 1, 4, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]

However my result is a matrix that shows incorrect values. When I do the matrix by hand, the correct output looks like this: Correct output. I feel like my logic makes sense, what am I doing incorrectly?

Thanks everyone.

like image 355
Chris Lin Avatar asked May 01 '26 04:05

Chris Lin


1 Answers

First off, to make things clear, the longest common subsequence problem is not the same as the longest common substring problem. What you're trying to solve is the latter; better not to confuse the two.

Secondly, your else branches are not aligned under the appropriate if conditions. Whenever the strings match X[i] == Y[j], we set the matrix element to 1 if the index i or j is 0 since i-1 or j-1 at 0 gives -1 (unfortunately, this is also the index for last item in Python) which is not what we want, otherwise we increment for higher indices i, j > 1.

Thirdly, your looping should start at 0 not 1 since we start from the first characters in the strings which are at indices 0:

def compute_lcs(X, Y):
   m = len(X)
   n = len(Y)
   # An (m) times (n) matrix
   matrix = [[0] * n for _ in range(m)]
   for i in range(0, m):
      for j in range(0, n):
          if X[i] == Y[j]: 
              if i == 0 or j == 0:
                  matrix[i][j] = 1
              else:
                  matrix[i][j] = matrix[i-1][j-1]+1
          else:
              matrix[i][j] = 0
  return matrix

To get the exact matrix shown in the expected output, you should swap the order of the arguments or transpose the matrix before printing. Note however that these are not necessary (swapping or transposing) and only serve for formatting purposes:

b = compute_lcs('TACGCTGGA', 'AACTGGCAG')
for y in b:
    print (y)


[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, 1, 0, 0, 0, 2, 0, 0]
[0, 0, 0, 2, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 3, 1, 0, 0, 1]
[0, 0, 0, 0, 1, 4, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]

like image 168
Moses Koledoye Avatar answered May 02 '26 17:05

Moses Koledoye