I am trying to find the longest common subsequence between two strings.
I watched this tutoial https://www.youtube.com/watch?v=NnD96abizww
and wrote:
# Longest Common Subsequence
def lcs(s1, s2):
matrix = [ [0 for x in range(len(s2))] for x in range(len(s1)) ]
cs = ""
for i in range(len(s1)):
for j in range(len(s2)):
if s1[i]==s2[j]:
if i==0 or j==0:
matrix[i][j] = 1
cs += s1[i]
else:
matrix[i][j] = matrix[i-1][j-1] + 1
cs += s1[i]
else:
if i==0 or j==0:
matrix[i][j] = 0
else:
matrix[i][j] = max(matrix[i-1][j], matrix[i][j-1])
return matrix[len(s1)-1][len(s2)-1], cs
print(lcs("abcdaf", "acbcf"))
I get (3, 'abccaf')
This is clearly wrong it should be 4 abcf.
Not sure what step is going wrong. One general question is how long does it take usually for programmers to "get" these kind of questions?
LCS Problem Statement: Given two sequences, find the length of longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, '”acefg”, .. etc are subsequences of “abcdefg”.
To find the number of common subsequences in two string, say S and T, we use Dynamic Programming by defining a 2D array dp[][], where dp[i][j] is the number of common subsequences in the string S[0…i-1] and T[0…. j-1].
The Longest Increasing Subsequence (LIS) problem is to find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order. For example, the length of LIS for {10, 22, 9, 33, 21, 50, 41, 60, 80} is 6 and LIS is {10, 22, 33, 50, 60, 80}.
Using Dynamic Programming to find the LCSCreate a table of dimension n+1*m+1 where n and m are the lengths of X and Y respectively. The first row and the first column are filled with zeros. Fill each cell of the table using the following logic.
There are 2 main problems with your code that cause the algorithm to output the wrong answer.
if i == 0 or j == 0
in line 16Just following the video shows that this line makes no sense when s1[1] != s2[j]
, because the longest common subsequence of "ab" and "a" has length 1 although your algorithm sets matrix[0][1] = 0
for this example. So you need to remove this if statement. While your at it you have to consider what max(matrix[i-1][j], matrix[i][j-1])
does for i == 0
or j == 0
. Now there are two different approaches:
The explicit one:
max(matrix[i-1][j] if i != 0 else 0,
matrix[i][j-1] if j != 0 else 0)
The implicit one:
max(matrix[i-1][j], matrix[i][j-1])
This one works, because in Python negative indices are used to get the last item of a list and these items are 0 in this case.
cs += s1[i]
in line 11/14For example if you found that the longest common subsequence of "a" and "abcd" is "a", your algorithm sets the longest common subsequence for "a" and "abcda" as "aa", which doesn't make sense. I am struggling to explain why it does not work like that, so i suggest you look at a few examples, maybe using http://pythontutor.com/visualize.html
To approach both problems you can use the matrix to store the longest common subsequence you found for the smaller problems. You end up with this:
def lcs(s1, s2):
matrix = [["" for x in range(len(s2))] for x in range(len(s1))]
for i in range(len(s1)):
for j in range(len(s2)):
if s1[i] == s2[j]:
if i == 0 or j == 0:
matrix[i][j] = s1[i]
else:
matrix[i][j] = matrix[i-1][j-1] + s1[i]
else:
matrix[i][j] = max(matrix[i-1][j], matrix[i][j-1], key=len)
cs = matrix[-1][-1]
return len(cs), cs
print(lcs("abcdaf", "acbcf"))
This specific implementation only returns one possible result. You can try to implement an algorithm that gives all of the longest common sequences as an exercise. Maybe have a look at the Wikipedia page as suggested by גלעד ברקן
There is obviously no clear answer. It always helps to think about examples and in the case of algorithms Wikipedia often has a good pseudocode, which you can base your implementations on. When you are familiar with the concepts and datastructures involved in the algorithm, you should be able to implement it within a day, I would say (but I am definitely no expert). In general searching for logical bugs in your code, can take multiple days, depending on the size of the code. To practice this kind of structured, algorithmic and mathematical thinking I can highly recommend projecteuler.net.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With