Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find longest unique substring in string python

I am trying that age old question (there are multitudes of versions around) of finding the longest substring of a string which doesn't contain repeated characters. I can't work out why my attempt doesn't work properly:

def findLongest(inputStr):
    resultSet = []
    substr = []

    for c in inputStr:
        print ("c: ", c)
        if substr == []:
            substr.append([c])
            continue

        print(substr)
        for str in substr:
            print ("c: ",c," - str: ",str,"\n")
            if c in str:
                resultSet.append(str)
                substr.remove(str)
            else:
                str.append(c)
        substr.append([c])



    print("Result set:")
    print(resultSet)
    return max(resultSet, key=len)

print (findLongest("pwwkewambb"))

When my output gets to the second 'w', it doesn't iterate over all the substr elements. I think I've done something silly, but I can't see what it is so some guidance would be appreciated! I feel like I'm going to kick myself at the answer...

The beginning of my output:

c:  p
c:  w
[['p']]
c:  w  - str:  ['p']

c:  w
[['p', 'w'], ['w']]
c:  w  - str:  ['p', 'w'] # I expect the next line to say c: w - str: ['w']

c:  k
[['w'], ['w']] # it is like the w was ignored as it is here
c:  k  - str:  ['w']

c:  k  - str:  ['w']
...

EDIT:

I replaced the for loop with

for idx, str in enumerate(substr):
    print ("c: ",c," - str: ",str,"\n")
    if c in str:
        resultSet.append(str)
        substr[idx] = []
    else:
        str.append(c)

and it produces the correct result. The only thing is that the empty element arrays get set with the next character. It seems a bit pointless; there must be a better way.

My expected output is kewamb.

e.g.

c:  p
c:  w
[['p']]
c:  w  - str:  ['p']

c:  w
[['p', 'w'], ['w']]
c:  w  - str:  ['p', 'w']

c:  w  - str:  ['w']

c:  k
[[], [], ['w']]
c:  k  - str:  []

c:  k  - str:  []

c:  k  - str:  ['w']

c:  e
[['k'], ['k'], ['w', 'k'], ['k']]
c:  e  - str:  ['k']

c:  e  - str:  ['k']

c:  e  - str:  ['w', 'k']

c:  e  - str:  ['k']
...
like image 603
dgBP Avatar asked Oct 06 '17 20:10

dgBP


4 Answers

Edit, per comment by @seymour on incorrect responses:

def find_longest(s):
    _longest = set()
    def longest(x):
         if x in _longest:
             _longest.clear()
             return False
         _longest.add(x)
         return True
    return ''.join(max((list(g) for _, g in groupby(s, key=longest)), key=len))

And test:

In [101]: assert find_longest('pwwkewambb') == 'kewamb'

In [102]: assert find_longest('abcabcbb') == 'abc'

In [103]: assert find_longest('abczxyabczxya') == 'abczxy'

Old answer:

from itertools import groupby

s = set() ## for mutable access

''.join(max((list(g) for _, g in groupby('pwwkewambb', key=lambda x: not ((s and x == s.pop()) or s.add(x)))), key=len))
'kewamb'

groupby returns an iterator grouped based on the function provided in the key argument, which by default is lambda x: x. Instead of the default we are utilizing some state by using a mutable structure (which could have been done a more intuitive way if using a normal function)

lambda x: not ((s and x == s.pop()) or s.add(x))

What is happening here is since I can't reassign a global assignment in a lambda (again I can do this, using a proper function), I just created a global mutable structure that I can add/remove. The key (no pun) is that I only keep elements that I need by using a short circuit to add/remove items as needed.

max and len are fairly self explanatory, to get the longest list produced by groupby

Another version without the mutable global structure business:

def longest(x):
     if hasattr(longest, 'last'):
         result = not (longest.last == x)
         longest.last = x
         return result
     longest.last = x
     return True


''.join(max((list(g) for _, g in groupby('pwwkewambb', key=longest)), key=len))
'kewamb'
like image 169
salparadise Avatar answered Oct 22 '22 18:10

salparadise


Not sure what is wrong in your attempt, but it's complex and in:

    for str in substr:
        print ("c: ",c," - str: ",str,"\n")
        if c in str:
            resultSet.append(str)
            substr.remove(str)

you're removing elements from a list while iterating on it: don't do that, it gives unexpected results.

Anyway, my solution, not sure it's intuitive, but it's probably simpler & shorter:

  • slice the string with an increasing index
  • for each slice, create a set and store letters until you reach the end of the string or a letter is already in the set. Your index is the max length
  • compute the max of this length for every iteration & store the corresponding string

Code:

def findLongest(s):
    maxlen = 0
    longest = ""
    for i in range(0,len(s)):
        subs = s[i:]
        chars = set()
        for j,c in enumerate(subs):
            if c in chars:
                break
            else:
                chars.add(c)
        else:
            # add 1 when end of string is reached (no break)
            # handles the case where the longest string is at the end
            j+=1
        if j>maxlen:
            maxlen=j
            longest=s[i:i+j]
    return longest

print(findLongest("pwwkewambb"))

result:

kewamb
like image 39
Jean-François Fabre Avatar answered Oct 22 '22 16:10

Jean-François Fabre


Depends on your definition of repeated characters: if you mean consecutive, then the approved solution is slick, but not of characters appearing more than once (e.g.: pwwkewabmb -> 'kewabmb' ).

Here's what I came up with (Python 2):

def longest(word):
    begin = 0
    end = 0
    longest = (0,0)
    for i in xrange(len(word)):
        try:
            j = word.index(word[i],begin,end)
            # longest?
            if end-begin >= longest[1]-longest[0]:
                longest = (begin,end)
            begin = j+1
            if begin==end:
                end += 1
        except:
            end = i+1
    end=i+1
    if end-begin >= longest[1]-longest[0]:
        longest = (begin,end)
    return word[slice(*longest)]

Thus

>>> print longest('pwwkewabmb')
kewabm
>>> print longest('pwwkewambb')
kewamb
>>> print longest('bbbb')
b
like image 45
M. Flanzer Avatar answered Oct 22 '22 16:10

M. Flanzer


My 2-cents:

from collections import Counter

def longest_unique_substr(s: str) -> str:

    # get all substr-ings from s, starting with the longest one
    for substr_len in range(len(s), 0, -1):
        for substr_start_index in range(0, len(s) - substr_len + 1):
            substr = s[substr_start_index : substr_start_index + substr_len]

            # check if all substr characters are unique
            c = Counter(substr)
            if all(v == 1 for v in c.values()):
                return substr

    # ensure empty string input returns ""
    return ""

Run:

In : longest_unique_substr('pwwkewambb')
Out: 'kewamb'
like image 1
gmagno Avatar answered Oct 22 '22 17:10

gmagno