Say I want to replace all the matches of Mr.
and Mr
with Mister
.
I am using the following regex: \bMr(\.)?\b
to match either Mr.
or just Mr
. Then, I use the re.sub()
method to do the replacement.
What is puzzling me is that it is replacing Mr.
with Mister.
. Why is this keeping the dot .
at the end? It looks like it is not matching the Mr\.
case but just Mr
.
import re
s="a rMr. Nobody Mr. Nobody is Mr Nobody and Mra Nobody."
re.sub(r"\bMr(\.)?\b","Mister", s)
Returns:
'a rMr. Nobody Mister. Nobody is Mister Nobody and Mra Nobody.'
I also tried with the following, but also without luck:
re.sub(r"\b(Mr\.|Mr)\b","Mister", s)
My desired output is:
'a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.'
^ ^
no dot this should be kept as it is
It simply looks either that particular character is present or not. It makes the character as optional, the regex will select if the character is there, and it will also match if the character is not in the test string. For zero or more repetition * is used.
Except for JavaScript and VBScript, all regex flavors discussed here have an option to make the dot match all characters, including line breaks. In PowerGREP, tick the checkbox labeled “dot matches line breaks” to make the dot match all characters. In EditPad Pro, turn on the “Dot” or “Dot matches newline” search option.
This exception exists mostly because of historic reasons. The first tools that used regular expressions were line-based. They would read a file line by line, and apply the regular expression separately to each line. The effect is that with these tools, the string could never contain line breaks, so the dot could never match them.
? as a metacharacter here means zero or 1 repetition. It simply looks either that particular character is present or not. It makes the character as optional, the regex will select if the character is there, and it will also match if the character is not in the test string.
I think you want to capture 'Mr'
followed by either a '.'
or a word boundary:
r"\bMr(?:\.|\b)"
In use:
>>> import re
>>> re.sub(r"\bMr(?:\.|\b)", "Mister", "a rMr. Nobody Mr. Nobody is Mr Nobody and Mra Nobody.")
'a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.'
re.sub(r"\bMr\.|\bMr\b","Mister", s)
Try this.You need to remove \b
after .
Output:a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.'
The reason why \bMr(\.)?\b
is not working because between .
and space
there is no word boundary.
There are three different positions that qualify as word boundaries:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With