If i have a list of prefix that can be attached to a string, how do i split a string such into it's prefix and the other characters in the next substring. For example:
prefixes = ['over','under','re','un','co']
str1 = "overachieve"
output: ["over","achieve"]
str2 = "reundo"
output = ["re","un","do"]
Is there a better way to do the above task, maybe with regex or some string functions other than:
str1 = "reundo"
output = []
for x in [p for p in prefixes if p in str1]:
    output.append(x)    
    str1 =  str1.replace(x,"",1)
output.append(str1)
                Regular expressions are an efficient way to search for many alternative prefixes:
import re
def split_prefixes(word, prefixes):
    regex = re.compile('|'.join(sorted(prefixes, key=len, reverse=True)))
    result = []
    i = 0
    while True:
        mo = regex.match(word, i)
        if mo is None:
            result.append(word[i:])
            return result
        result.append(mo.group())
        i = mo.end()
>>> prefixes = ['over', 'under', 're', 'un', 'co']
>>> for word in ['overachieve', 'reundo', 'empire', 'coprocessor']:
        print word, '-->', split_prefixes(word, prefixes)
overachieve --> ['over', 'achieve']
reundo --> ['re', 'un', 'do']
empire --> ['empire']
coprocessor --> ['co', 'processor']
                        prefixes = ['over','under','re','un','co']
def test(string, prefixes, existing=None):
    prefixes.sort(key = lambda s: len(s))
    prefixes.reverse() # This and the previous line ensure that longer prefixes are searched first regardless of initial sorting.
    if existing is None:
        existing = [] # deals with the fact that placing [] as a default parameter and modifying it modifies it for the entire session
    for prefix in prefixes:
        if string.startswith(prefix):
            existing.append(prefix)
            return test(string[len(prefix):], prefixes, existing)
    existing.append(string)
    return existing
This code runs through a string recursively, removing known prefixes until it runs out, then returning the entire list. On longer strings, the generator is probably a better route, but on shorter strings the lack of need for the additional overhead of a generator might make this a better solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With