Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursively split strings that contains a defined set of prefixes - Python

If i have a list of prefix that can be attached to a string, how do i split a string such into it's prefix and the other characters in the next substring. For example:

prefixes = ['over','under','re','un','co']

str1 = "overachieve"
output: ["over","achieve"]

str2 = "reundo"
output = ["re","un","do"]

Is there a better way to do the above task, maybe with regex or some string functions other than:

str1 = "reundo"
output = []

for x in [p for p in prefixes if p in str1]:
    output.append(x)    
    str1 =  str1.replace(x,"",1)
output.append(str1)
like image 792
alvas Avatar asked Jan 28 '13 08:01

alvas


2 Answers

Regular expressions are an efficient way to search for many alternative prefixes:

import re

def split_prefixes(word, prefixes):
    regex = re.compile('|'.join(sorted(prefixes, key=len, reverse=True)))
    result = []
    i = 0
    while True:
        mo = regex.match(word, i)
        if mo is None:
            result.append(word[i:])
            return result
        result.append(mo.group())
        i = mo.end()


>>> prefixes = ['over', 'under', 're', 'un', 'co']
>>> for word in ['overachieve', 'reundo', 'empire', 'coprocessor']:
        print word, '-->', split_prefixes(word, prefixes)

overachieve --> ['over', 'achieve']
reundo --> ['re', 'un', 'do']
empire --> ['empire']
coprocessor --> ['co', 'processor']
like image 149
Raymond Hettinger Avatar answered Nov 03 '22 19:11

Raymond Hettinger


prefixes = ['over','under','re','un','co']

def test(string, prefixes, existing=None):
    prefixes.sort(key = lambda s: len(s))
    prefixes.reverse() # This and the previous line ensure that longer prefixes are searched first regardless of initial sorting.
    if existing is None:
        existing = [] # deals with the fact that placing [] as a default parameter and modifying it modifies it for the entire session
    for prefix in prefixes:
        if string.startswith(prefix):
            existing.append(prefix)
            return test(string[len(prefix):], prefixes, existing)
    existing.append(string)
    return existing

This code runs through a string recursively, removing known prefixes until it runs out, then returning the entire list. On longer strings, the generator is probably a better route, but on shorter strings the lack of need for the additional overhead of a generator might make this a better solution.

like image 20
Drakekin Avatar answered Nov 03 '22 18:11

Drakekin