Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String coverage optimization in Python

I have this initial string.

'bananaappleorangestrawberryapplepear'

And also have a tuple with strings:

('apple', 'plepe', 'leoran', 'lemon')

I want a function so that from the initial string and the tuple with strings I obtain this:

'bananaxxxxxxxxxgestrawberryxxxxxxxar'

I know how to do it imperatively by finding the word in the initial string for every word and then loop character by character in all initial string with replaced words.

But it's not very efficient and ugly. I suspect there should be some way of doing this more elegantly, in a functional way, with itertools or something. If you know a Python library that can do this efficiently please let me know.

UPDATE: Justin Peel pointed out a case I didn't describe in my initial question. If a word is 'aaa' and 'aaaaaa' is in the initial string, the output should look like 'xxxxxx'.

like image 724
Danny Navarro Avatar asked Nov 13 '10 17:11

Danny Navarro


2 Answers

import re

words = ('apple', 'plepe', 'leoran', 'lemon')
s = 'bananaappleorangestrawberryapplepear'

x = set()

for w in words:
    for m in re.finditer(w, s):
        i = m.start()
        for j in range(i, i+len(w)):
            x.add(j)

result = ''.join(('x' if i in x else s[i]) for i in range(len(s)))
print result

produces:

bananaxxxxxxxxxgestrawberryxxxxxxxar
like image 136
Ned Batchelder Avatar answered Oct 11 '22 03:10

Ned Batchelder


Here's another answer. There might be a faster way to replace the letters with x's, but I don't think that it is necessary because this is already pretty fast.

import re

def do_xs(s,pats):
    pat = re.compile('('+'|'.join(pats)+')')

    sout = list(s)
    i = 0
    match = pat.search(s)
    while match:
        span = match.span()
        sout[span[0]:span[1]] = ['x']*(span[1]-span[0])
        i = span[0]+1
        match = pat.search(s,i)
    return ''.join(sout)

txt = 'bananaappleorangestrawberryapplepear'
pats = ('apple', 'plepe', 'leoran', 'lemon')
print do_xs(txt,pats)

Basically, I create a regex pattern that will match any of the input patterns. Then I just keep restarting the search starting 1 after the starting position of the most recent match. There might be a problem though if you have one of the input patterns is a prefix of another input pattern.

like image 28
Justin Peel Avatar answered Oct 11 '22 01:10

Justin Peel