Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does re.sub replace the entire pattern, not just a capturing group within it?

re.sub('a(b)','d','abc') yields dc, not adc.

Why does re.sub replace the entire capturing group, instead of just capturing group'(b)'?

like image 748
Nick Avatar asked Feb 08 '17 04:02

Nick


People also ask

Does re sub replace all occurrences?

By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.

What does re sub () do?

re. sub() function is used to replace occurrences of a particular sub-string with another sub-string. This function takes as input the following: The sub-string to replace.

How do you use re sub in Python?

If you want to replace a string that matches a regular expression (regex) instead of perfect match, use the sub() of the re module. In re. sub() , specify a regex pattern in the first argument, a new string in the second, and a string to be processed in the third.

What is R IN RE sub?

The r prefix is part of the string syntax. With r , Python doesn't interpret backslash sequences such as \n , \t etc inside the quotes. Without r , you'd have to type each backslash twice in order to pass it to re.


4 Answers

Because it's supposed to replace the whole occurrence of the pattern:

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.

If it were to replace only some subgroup, then complex regexes with several groups wouldn't work. There are several possible solutions:

  1. Specify pattern in full: re.sub('ab', 'ad', 'abc') - my favorite, as it's very readable and explicit.
  2. Capture groups which you want to preserve and then refer to them in the pattern (note that it should be raw string to avoid escaping): re.sub('(a)b', r'\1d', 'abc')
  3. Similar to previous option: provide a callback function as repl argument and make it process the Match object and return required result.
  4. Use lookbehinds/lookaheds, which are not included in the match, but affect matching: re.sub('(?<=a)b', r'd', 'abxb') yields adxb. The ?<= in the beginning of the group says "it's a lookahead".
like image 105
yeputons Avatar answered Nov 06 '22 06:11

yeputons


Because that's exactly what re.sub() doc tells you it's supposed to do:

  • the pattern 'a(b)' says "match 'a', with optional trailing 'b'". (It could match 'a' on its own, but there is no way it could ever match 'b' on its own as you seem to expect. If you meant that, use a non-greedy (a)??b).
  • the replacement-string is 'd'
  • hence on your string 'abc', it matches all of 'ab' and replaces it with 'd', thus result is 'dc'

If you want your desired output, you'd need a non-greedy match on the '(a)??':

>>> re.sub('(a)??b','d','abc')
'dc'
like image 40
smci Avatar answered Nov 06 '22 06:11

smci


I'm aware that this is not strictly answering the OP question, but this question can be hard to google (flooded by \1 explanation ...)

for those who like me came here because they'd like to actually replace a capture group that is not the first one by a string, without special knowledge of the string nor of the regex :

#find offset [start, end] of a captured group within string
r = regex.search(oldText).span(groupNb)
#slice the old string and insert replacementText in the middle 
newText = oldText[:r[0]] + replacementText + oldText[r[1]:]

I know it's the wanted behavior, but I still do not understand why re.sub can't specify the actual capture group that it should substitute on...

like image 30
Mr Buisson Avatar answered Nov 06 '22 07:11

Mr Buisson


import re

pattern = re.compile(r"I am (\d{1,2}) .*", re.IGNORECASE)

text = "i am 32 years old"

if re.match(pattern, text):
    print(
        re.sub(pattern, r"Your are \1 years old.", text, count=1)
    )

As above, first we compile a regex pattern with case insensitive flag.

Then we check if the text matches the pattern, if it does, we reference the only group in the regex pattern (age) with group number \1.

like image 26
Zilong Li Avatar answered Nov 06 '22 05:11

Zilong Li