Using re.sub
in Python 2.7, the following example uses a simple backreference:
re.sub('-{1,2}', r'\g<0> ', 'pro----gram-files')
It outputs the following string as expected:
'pro-- -- gram- files'
I would expect the following example to be identical, but it is not:
def dashrepl(matchobj):
return r'\g<0> '
re.sub('-{1,2}', dashrepl, 'pro----gram-files')
This gives the following unexpected output:
'pro\\g<0> \\g<0> gram\\g<0> files'
Why do the two examples give different output? Did I miss something in the documentation that explains this? Is there any particular reason that this behavior is preferable to what I expected? Is there a way to use backreferences in a replacement function?
As there are simpler ways to achieve your goal, you can use them.
As you already see, your replacement function gets a match object as it argument.
This object has, among others, a method group()
which can be used instead:
def dashrepl(matchobj):
return matchobj.group(0) + ' '
which will give exactly your result.
But you are completely right - the docs are a bit confusing in that way:
they describe the repl
argument:
repl
can be a string or a function; if it is a string, any backslash escapes in it are processed.
and
If
repl
is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
You could interpret this as if "the replacement string" returned by the function would also apply to the processment of backslash escapes.
But as this processment is described only for the case that "it is a string", it becomes clearer, but not obvious at the first glance.
If you pass in a function to re.sub
, it allows you to replace the match with the string that is returned from the function. Basically, re.sub
uses different code paths depending on if you pass a function or a string. And yes, this is in fact desireable. Consider the case where you want to replace matches of foo
with bar
and matches of baz
with qux
. You can then write it as:
repdict = {'foo':'bar','baz':'qux'}
re.sub('foo|baz',lambda match: repdict[match.group(0)],'foo')
You could argue that you could do this in 2 passes, but you can't do that if repdict
looks like {'foo':'baz','baz':'qux'}
And I don't think you can do that with back-references (at least not easily).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With