Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.sub back reference not back referencing [duplicate]

Tags:

python

regex

People also ask

Does re sub replace all occurrences?

By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.

What is Backreference in regular expression Python?

Introduction to the Python regex backreferences The backreferences allow you to reference capturing groups within a regular expression. In this syntax, N can be 1, 2, 3, etc. that represents the corresponding capturing group. Note that the \g<0> refer to the entire match, which has the same value as the match.

What does re sub () do?

re. sub() function is used to replace occurrences of a particular sub-string with another sub-string. This function takes as input the following: The sub-string to replace.

How do you're sub in Python?

If you want to replace a string that matches a regular expression (regex) instead of perfect match, use the sub() of the re module. In re. sub() , specify a regex pattern in the first argument, a new string in the second, and a string to be processed in the third.


You need to use a raw-string here so that the backslash isn't processed as an escape character:

>>> import re
>>> fileText = '<text top="52" left="20" width="383" height="15" font="0"><b>test</b></text>'
>>> fileText = re.sub("<b>(.*?)</b>", r"\1", fileText, flags=re.DOTALL)
>>> fileText
'<text top="52" left="20" width="383" height="15" font="0">test</text>'
>>>

Notice how "\1" was changed to r"\1". Though it is a very small change (one character), it has a big effect. See below:

>>> "\1"
'\x01'
>>> r"\1"
'\\1'
>>>