The goal of the code is to find the longest alphabetical substring within a string.
s = 'xyzbcdezzz'
longest_string = ''
current_string = ''
stringcount = 0
for n in range (len(s) - 1):
if s[n] <= s[n+1]:
current_string += (s[n]+s[n+1])
stringcount += 1
print('current string:', stringcount, current_string)
elif s[n] > s[n+1]:
if len(current_string) > len(longest_string) :
longest_string = current_string
current_string = ''
stringcount = 0
print('the longest string checked is:', longest_string, ', count reset')
if len(current_string) == len(longest_string):
print (current_string, longest_string)
if len(current_string) > len(longest_string):
print (current_string)
if len(longest_string) > len(current_string):
print(longest_string)
When I run this code, it gives 'abbccd' as the longest substring, when it's actually 'abcd'. This is because it checks the character a, comparing it to the next in the sequence, and then adds a to b giving "ab". Then it checks b, comparing to c and adds bc together, and then adds "bc" to "ab".
To fix this, I've been attempting to make the loop skip the next character if it's in alphabetical order already, and check the next one by increasing the value of 'n' once the condition is met, but this doesn't seem to do anything at all.
Advice, tips, corrections and harsh criticism are all welcomed.
EDIT: It appears I've misled some of you, so I apologise. What I meant was that if I have a string, it extracts the longest possible substring in alphabetical order. In the case of xyzbcdezzz, it will extract 'bcdezzz' because that's the longest possible alphabetical order substring, not bcde. The problem with my current code, is that it gives bccddeezzzzz. If I could skip one loop when the first if condition is true, then I think it might work in my code.
The isalpha() function is a built-in function used for string handling in python, which checks if the single input character is an alphabet or if all the characters in the input string are alphabets.
Python String isalpha()The isalpha() method returns True if all characters in the string are alphabets. If not, it returns False.
TL;DR: the last code after the edit solves the problem
This is a variant of the Longest Common Substring Problem.
def longestSubstring(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
return answer
alphabets = "abcdefghijklmnopqrstuvwxyz"
s = 'jamabcdaskl'
print('longest substring:', longestSubstring(s,alphabets))
Credits to this post for the subroutine.
Edit:
It seems like the above code doesn't work for all cases, so I had to redesign the function.
def longestAlphaSubstring(str2):
str1 = "abcdefghijklmnopqrstuvwxyz"
longest = ""
for i in range(len(str1)+1):
if str1[:i] in str2 and len(str1[:i])>len(longest):
longest = str1[:i]
return longest
print(longestAlphaSubstring('jamabcdaskl'))
print(longestAlphaSubstring('asdklfjalkdfjabcdefghijklmnopqrstuvwxyzaaabcasdkfjl;kasdf'))
Output:
abcd
abcdefghijklmnopqrstuvwxyz
This works on the assumption that the substring should always begin with a
. This iterates through every possible substring from 'a', 'ab', 'abc', ... , up to the full string of alphabets and then stores the longest substring encountered in the check.
For the sake of completeness, here is the code that would work for any longest common substring:
def longestSubstring(str1, str2):
longest = ""
for j in range(len(str1)):
for i in range(len(str1)+1):
if str1[j:i] in str2 and len(str1[j:i])>len(longest):
longest = str1[j:i]
return longest
where one string contains the alphabets in order and the other contains the test string. Be aware that this is O(n^2) in complexity (not that it matters for small cases).
a different version looping over zip(strg, strg[1:])
in order to compare the previous and the current character in the same loop iteration:
def longest_substring(strg):
substring = strg[0]
max_substring = ''
for cur, nxt in zip(strg, strg[1:]):
if ord(nxt) >= ord(cur):
substring += nxt
if len(substring) > len(max_substring):
max_substring = substring
else:
substring = nxt
return max_substring
comparing the characters with ord
this way has the disadvantage that these characters !"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_``abcdefghijklmnopqrstuvwxyz{|}~
will be considered as 'in alphabetical order'. you may need to tweak that to your needs...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With