unexpected result for python re.sub() with non-capturing character

Question

I cannot understand the following output :

import re 

re.sub(r'(?:\s)ff','fast-forward',' ff')
'fast-forward'

According to the documentation :

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.

So why is the whitespace included in the captured occurence, and then replaced, since I added a non-capturing tag before it?

I would like to have the following output :

' fast-forward'

Wiktor Stribiżew · Accepted Answer

The non-capturing group still matches and consumes the matched text. Note that consuming means adding the matched text to the match value (memory buffer alotted for the whole matched substring) and the corresponding advancing of the regex index. So, (?:\s) puts the whitespace into the match value, and it is replaced with the ff.

You want to use a look-behind to check for a pattern without consuming it:

re.sub(r'(?<=\s)ff','fast-forward',' ff')

See the regex demo.

An alternative to this approach is using a capturing group around the part of the pattern one needs to keep and a replacement backreference in the replacement pattern:

re.sub(r'(\s)ff',r'\1fast-forward',' ff')
         ^  ^      ^^

Here, (\s) saves the whitespace in Group 1 memory buffer and \1 in the replacement retrieves it and adds to the replacement string result.

See the Python demo:

import re 
print('"{}"'.format(re.sub(r'(?<=\s)ff','fast-forward',' ff')))
# => " fast-forward"

unexpected result for python re.sub() with non-capturing character

Tags:

python

regex

plalanne

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

unexpected result for python re.sub() with non-capturing character

Tags:

python

regex

plalanne

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us