I am learning 're
' part of Python, and the named pattern (?P=name)
confused me,
When I using re.sub()
to make some exchange for digit and character, the patter '(?P=name)
' doesn't work, but the pattern '\N
' and '\g<name>
' still make sense. Code below:
[IN]print(re.sub(r'(?P<digit>\d{3})-(?P<char>\w{4})', r'(?P=char)-(?P=digit)', '123-abcd'))
[OUT] (?P=char)-(?P=digit)
[IN] print(re.sub(r'(?P<digit>\d{3})-(?P<char>\w{4})', r'\2-\1', '123-abcd'))
[OUT] abcd-123
[IN] print(re.sub(r'(?P<digit>\d{3})-(?P<char>\w{4})', r'\g<char>-\g<digit>', '123-abcd'))
[OUT] abcd-123
Why it failed to make substitute when I use (?P=name)
?
And how to use it correctly?
I am using Python 3.5
The (?P=name)
is an inline (in-pattern) backreference. You may use it inside a regular expression pattern to match the same content as is captured by the corresponding named capturing group, see the Python Regular Expression Syntax reference:
(?P=name)
A backreference to a named group; it matches whatever text was matched by the earlier group named name.
See this demo: (?P<digit>\d{3})-(?P<char>\w{4})&(?P=char)-(?P=digit)
matches 123-abcd&abcd-123
because the "digit" group matches and captures 123
, "char" group captures abcd
and then the named inline backreferences match abcd
and 123
.
To replace matches, use \1
, \g<1>
or \g<char>
syntax with re.sub
replacement pattern. Do not use (?P=name)
for that purpose:
repl can be a string or a function... Backreferences, such as
\6
, are replaced with the substring matched by group 6 in the pattern...
In string-type repl arguments, in addition to the character escapes and backreferences described above,\g<name>
will use the substring matched by the group named name, as defined by the(?P<name>...)
syntax.\g<number>
uses the corresponding group number;\g<2>
is therefore equivalent to\2
, but isn’t ambiguous in a replacement such as\g<2>0
.\20
would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference\g<0>
substitutes in the entire substring matched by the RE.
You can check the details of using and back-referencing ?P visiting:
https://docs.python.org/3/library/re.html
and using CTRL+F in your browser to look for (?P...). It comes a nice chart with all the instructions about when you can make use of ?P=name.
For this example, you're doing right at your third re.sub() call.
In the all re.sub() calls you can only use the ?P=name syntax in the first string parameter of this method and you don't need it in the second string parameter because you have the \g syntax.
In case you're confuse about the ?P=name being useful, it is, but for making a match by backreferencing an already named string.
Example: you want to match potatoXXXpotato and replace it for YYXXXYY. You could make:
re.sub(r'(?P<myName>potato)(XXX)(?P=myName)', r'YY\2YY', 'potatoXXXpotato')
or
re.sub(r'(?P<myName>potato)(?P<triple>XXX)(?P=myName)', r'YY\g<triple>YY', 'potatoXXXpotato')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With