How to find the longest common substring between two strings using Python?

Question

I want to write a Python code that computes the longest common substring between two strings from the input.

Example:

word1 = input('Give 1. word: xlaqseabcitt')
word2 = input('Give 2. word: peoritabcpeor')

Wanted output:

abc

I have code like this so far:

word1 = input("Give 1. word: ") 
word2 = input("Give 2. word: ")

longestSegment = "" 
tempSegment = ""

for i in range(len(word1)): 
    if word1[i] == word2[i]:
        tempSegment += word1[i] 
    else: 
        tempSegment = ""

if len(tempSegment) > len(longestSegment):
    longestSegment = tempSegment

print(longestSegment)

I end up with IndexError when word2 is shorter than word1, and it does not give me the common substring.

EDIT: I found this solution:

string1 = input('Give 1. word: ')
string2 = input('Give 2. word: ')
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
    for j in range(len2):
        lcs_temp=0
        match=''
        while ((i+lcs_temp < len1) and (j+lcs_temp<len2) and   string1[i+lcs_temp] == string2[j+lcs_temp]):
            match += string2[j+lcs_temp]
            lcs_temp+=1
        if (len(match) > len(answer)):
            answer = match

print(answer)

However, I would like to see a library function call that could be used to compute the longest common substring between two strings.

Alternatively, please suggest a more concise code to achieve the same.

Vladimir Fokow · Accepted Answer

A super fast library is available for Python: `pylcs`

It can find the indices of the longest common substring (LCS) between 2 strings, and can do some other related tasks as well.

A function to return the LCS using this library consists of 2 lines:

import pylcs

def find_LCS(s1, s2):
    res = pylcs.lcs_string_idx(s1, s2)
    return ''.join([s2[i] for i in res if i != -1])

Example:

s1 = 'bbbaaabaa'
s2 = 'abaabaab'
print(find_LCS(s1, s2))

aabaa

Explanation:

In this example res is:

[-1, -1, -1, -1, 2, 3, 4, 5, 6]

It is a mapping of all characters in s1 - to the indices of characters in s2 of the LCS. -1 indicates that the character of s1 is NOT part of the LCS.

The reasons behind the speed and efficiency of this library are that it's implemented in C++ and uses dynamic programming.

Alain T. · Answer

You can build a dictionary from the first string containing the positions of each character, keyed on the characters. Then go through the second string and compare the substring of each character with the rest of the second string at that position:

# extract common prefix
def common(A,B) :
    firstDiff = (i for i,(a,b) in enumerate(zip(A,B)) if a!=b) # 1st difference
    commonLen = next(firstDiff,min(len(A),len(B)))             # common length
    return A[:commonLen]

word1 = "xlaqseabcitt"
word2 = "peoritabcpeor"

# position(s) of each character in word1
sub1 = dict()  
for i,c in enumerate(word1): sub1.setdefault(c,[]).append(i)

# maximum (by length) of common prefixes from matching first characters
maxSub = max((common(word2[i:],word1[j:]) 
                       for i,c in enumerate(word2) 
                                 for j in sub1.get(c,[])),key=len)


print(maxSub) # abc

How to find the longest common substring between two strings using Python?

Tags:

python

Emiiri93

2 Answers

A super fast library is available for Python: `pylcs`

Example:

Explanation:

Vladimir Fokow

Alain T.

Recent Activity

Donate For Us

How to find the longest common substring between two strings using Python?

Tags:

python

Emiiri93

2 Answers

A super fast library is available for Python: pylcs

Example:

Explanation:

Vladimir Fokow

Alain T.

Related questions

Recent Activity

Donate For Us

A super fast library is available for Python: `pylcs`