Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest repeated substring

learning python as I go for school work. Essentially, I need to find the longest repeated substring in a list of string as shown here. I have been reading through this article and have an understanding of what I should do.

so far my implementation is as follows:

def long_rptr_subString(testList):
    longSubstring = ''

    if len(testList) > 1 and len(testList[0]) > 0:
        for i in range(len(testList[0])):
            for j in range(len(testList[0])-i+1):
                if j > len(longSubstring) and all(testList[0][i:i+j] in x for x in testList):
                    longSubstring = testList[0][i:i+j]
    return longSubstring

now when I call my function with let's say ['slide', 'glidb', 'flidt', 'cridz', 'bidr'] I get the correct result of 'id' as being my longest substring.

However, when I pass the list ['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah'] I don't get any return results. I should be getting back 'blah' as my longest substring, but I haven't figured out a way to accomplish this. It seems I am only able to match substrings when they are in ALL items of the list. How might I modify my code/logic to get the longest substring that occurs more than once?

Thank you.

like image 658
KidBatman Avatar asked Apr 26 '26 07:04

KidBatman


1 Answers

You can only match substrings in all items because that's exactly what you ask for:

all(testList[0][i:i+j] in x for x in testList)

Even if you change that, you can only find the longest substring that is in the first substring, because you only check through testlist[0] .

Instead, try something like:

def longest_substr(lst):
    longest = None
    for word in lst:
        for i in range(len(word)):
            for j in range(i+1, len(word)+1):
                if ((longest is None or (j - i > len(longest))) and
                    sum(word[i:j] in w for w in lst) > 1):
                    longest = word[i:j]
    return longest

This finds the longest substring that's in at least two (> 1) of the words (or will return None for an empty list or list of empty strings) and gives me the following results:

>>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr'])
'lid'
>>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah'])
'blah'

Note that, once you remove the requirement that the substring must be in all strings, the longest in your first list of words is actually 'lid'.

like image 61
jonrsharpe Avatar answered Apr 27 '26 21:04

jonrsharpe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!