Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I pass a callback to re.sub, but still inserting match captures?

Tags:

python

regex

Consider:

text = "abcdef"
pattern = "(b|e)cd(b|e)"

repl = [r"\1bla\2", r"\1blabla\2"]
text = re.sub(pattern, lambda m: random.choice(repl), text)

I want to replace matches randomly with entries of a list repl. But when using lambda m: random.choice(repl) as a callback, it doesn't replace \1, \2 etc. with its captures any more, returning "\1bla\2" as plain text.

I've tried to look up re.py on how they do it internally, so I might be able to call the same internal function, but it doesn't seem trivial.

The example above returns a\1bla\2f or a\1blabla\2f while abblaef or abblablaef are valid options in my case.

Note that I'm using a function, because, in case of several matches like text = "abcdef abcdef", it should randomly choose a replacement from repl for every match – instead of using the same replacement for all matches.

like image 748
ScientiaEtVeritas Avatar asked Mar 16 '20 04:03

ScientiaEtVeritas


1 Answers

If you pass a function you lose the automatic escaping of backreferences. You just get the match object and have to do the work. So you could:

Pick a string in the regex rather than passing a function:

text = "abcdef"
pattern = "(b|e)cd(b|e)"

repl = [r"\1bla\2", r"\1blabla\2"]
re.sub(pattern, random.choice(repl), text)
# 'abblaef' or 'abblablaef'

Or write a function that processes the match object and allows more complex processing. You can take advantage of expand to use back references:

text = "abcdef abcdef"
pattern = "(b|e)cd(b|e)"

def repl(m):
    repl = [r"\1bla\2", r"\1blabla\2"]           
    return m.expand(random.choice(repl))


re.sub(pattern, repl, text)

# 'abblaef abblablaef' and variations

You can, or course, put that function into a lambda:

repl = [r"\1bla\2", r"\1blabla\2"]
re.sub(pattern, lambda m: m.expand(random.choice(repl)), text)
like image 197
Mark Avatar answered Sep 30 '22 05:09

Mark