Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: Perfect Alphabetical Order

The goal of the code is to find the longest alphabetical substring within a string.

s = 'xyzbcdezzz'
longest_string = ''
current_string = ''
stringcount = 0

for n in range (len(s) - 1):
    if s[n] <= s[n+1]:
        current_string += (s[n]+s[n+1])
        stringcount += 1
        print('current string:', stringcount, current_string)


    elif s[n] > s[n+1]:
        if len(current_string) > len(longest_string) :
            longest_string = current_string
            current_string = ''
            stringcount = 0
            print('the longest string checked is:', longest_string, ', count reset')

if len(current_string) == len(longest_string):
    print (current_string, longest_string)
if len(current_string) > len(longest_string):
    print (current_string)
if len(longest_string) > len(current_string):
    print(longest_string)

When I run this code, it gives 'abbccd' as the longest substring, when it's actually 'abcd'. This is because it checks the character a, comparing it to the next in the sequence, and then adds a to b giving "ab". Then it checks b, comparing to c and adds bc together, and then adds "bc" to "ab".

To fix this, I've been attempting to make the loop skip the next character if it's in alphabetical order already, and check the next one by increasing the value of 'n' once the condition is met, but this doesn't seem to do anything at all.

Advice, tips, corrections and harsh criticism are all welcomed.

EDIT: It appears I've misled some of you, so I apologise. What I meant was that if I have a string, it extracts the longest possible substring in alphabetical order. In the case of xyzbcdezzz, it will extract 'bcdezzz' because that's the longest possible alphabetical order substring, not bcde. The problem with my current code, is that it gives bccddeezzzzz. If I could skip one loop when the first if condition is true, then I think it might work in my code.

like image 771
Daedalus Avatar asked Jun 10 '17 07:06

Daedalus


People also ask

How do you define an alphabet in Python?

The isalpha() function is a built-in function used for string handling in python, which checks if the single input character is an alphabet or if all the characters in the input string are alphabets.

Does Python have alphabetical characters?

Python String isalpha()The isalpha() method returns True if all characters in the string are alphabets. If not, it returns False.


2 Answers

TL;DR: the last code after the edit solves the problem

This is a variant of the Longest Common Substring Problem.

def longestSubstring(string1, string2):
    answer = ""
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
            else:
                if (len(match) > len(answer)): answer = match
                match = ""
    return answer

alphabets = "abcdefghijklmnopqrstuvwxyz"
s = 'jamabcdaskl'

print('longest substring:', longestSubstring(s,alphabets))

Credits to this post for the subroutine.

Edit:

It seems like the above code doesn't work for all cases, so I had to redesign the function.

def longestAlphaSubstring(str2):
    str1 = "abcdefghijklmnopqrstuvwxyz"
    longest = ""
    for i in range(len(str1)+1):
        if str1[:i] in str2 and len(str1[:i])>len(longest):
            longest = str1[:i]
    return longest

print(longestAlphaSubstring('jamabcdaskl'))
print(longestAlphaSubstring('asdklfjalkdfjabcdefghijklmnopqrstuvwxyzaaabcasdkfjl;kasdf'))

Output:

abcd
abcdefghijklmnopqrstuvwxyz

This works on the assumption that the substring should always begin with a. This iterates through every possible substring from 'a', 'ab', 'abc', ... , up to the full string of alphabets and then stores the longest substring encountered in the check.

For the sake of completeness, here is the code that would work for any longest common substring:

def longestSubstring(str1, str2):
    longest = ""
    for j in range(len(str1)):
        for i in range(len(str1)+1):
            if str1[j:i] in str2 and len(str1[j:i])>len(longest):
                longest = str1[j:i]
    return longest

where one string contains the alphabets in order and the other contains the test string. Be aware that this is O(n^2) in complexity (not that it matters for small cases).

like image 60
Ébe Isaac Avatar answered Oct 05 '22 23:10

Ébe Isaac


a different version looping over zip(strg, strg[1:]) in order to compare the previous and the current character in the same loop iteration:

def longest_substring(strg):

    substring = strg[0]
    max_substring = ''

    for cur, nxt in zip(strg, strg[1:]):

        if ord(nxt) >= ord(cur):
            substring += nxt
            if len(substring) > len(max_substring):
                max_substring = substring
        else:
            substring = nxt

    return max_substring

comparing the characters with ord this way has the disadvantage that these characters !"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_``abcdefghijklmnopqrstuvwxyz{|}~ will be considered as 'in alphabetical order'. you may need to tweak that to your needs...

like image 34
hiro protagonist Avatar answered Oct 05 '22 23:10

hiro protagonist