Say I want to replace all the matches of <code>Mr.</code> and <code>Mr</code> with <code>Mister</code>. I am using the following regex: <code>\bMr(\.)?\b</code> to match either <code>Mr.</code> or just <code>Mr</code>. Then, I use the <code>re.sub()</code> method to do the replacement. What is puzzling me is that it is replacing <code>Mr.</code> with <code>Mister.</code>. Why is this keeping the dot <code>.</code> at the end? It looks like it is not matching the <code>Mr\.</code> case but just <code>Mr</code>. <pre class="prettyprint"><code>import re s="a rMr. Nobody Mr. Nobody is Mr Nobody and Mra Nobody." re.sub(r"\bMr(\.)?\b","Mister", s) </code></pre> Returns: <pre class="prettyprint"><code>'a rMr. Nobody Mister. Nobody is Mister Nobody and Mra Nobody.' </code></pre> I also tried with the following, but also without luck: <pre class="prettyprint"><code>re.sub(r"\b(Mr\.|Mr)\b","Mister", s) </code></pre> My desired output is: <pre class="prettyprint"><code>'a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.' ^ ^ no dot this should be kept as it is </code></pre>

I think you want to capture <code>'Mr'</code> followed by either a <code>'.'</code> or a word boundary: <pre class="prettyprint"><code>r"\bMr(?:\.|\b)" </code></pre> In use: <pre class="prettyprint"><code>>>> import re >>> re.sub(r"\bMr(?:\.|\b)", "Mister", "a rMr. Nobody Mr. Nobody is Mr Nobody and Mra Nobody.") 'a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.' </code></pre>

<pre class="prettyprint"><code>re.sub(r"\bMr\.|\bMr\b","Mister", s) </code></pre> Try this.You need to remove <code>\b</code> after <code>.</code> Output:<code>a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.'</code> The reason why <code>\bMr(\.)?\b</code> is not working because between <code>.</code> and <code>space</code> there is no word boundary. There are three different positions that qualify as word boundaries: <ul> <li>Before the first character in the string, if the first character is a word character.</li> <li>After the last character in the string, if the last character is a word character.</li> <li>Between two characters in the string, where one is a word character and the other is not a word character.</li> </ul>

Optional dot in regex

Tags:

python

regex

python-2.7

Say I want to replace all the matches of Mr. and Mr with Mister.

I am using the following regex: \bMr(\.)?\b to match either Mr. or just Mr. Then, I use the re.sub() method to do the replacement.

What is puzzling me is that it is replacing Mr. with Mister.. Why is this keeping the dot . at the end? It looks like it is not matching the Mr\. case but just Mr.

import re
s="a rMr. Nobody Mr. Nobody is Mr Nobody and Mra Nobody."
re.sub(r"\bMr(\.)?\b","Mister", s)

Returns:

'a rMr. Nobody Mister. Nobody is Mister Nobody and Mra Nobody.'

I also tried with the following, but also without luck:

re.sub(r"\b(Mr\.|Mr)\b","Mister", s)

My desired output is:

'a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.'
                     ^                              ^
                     no dot            this should be kept as it is

327

asked Nov 13 '14 11:11

fedorqui 'SO stop harming'

2 Answers

I think you want to capture 'Mr' followed by either a '.' or a word boundary:

r"\bMr(?:\.|\b)"

In use:

>>> import re
>>> re.sub(r"\bMr(?:\.|\b)", "Mister", "a rMr. Nobody Mr. Nobody is Mr Nobody and Mra Nobody.")
'a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.'

113

answered Sep 30 '22 07:09

jonrsharpe

re.sub(r"\bMr\.|\bMr\b","Mister", s)

Try this.You need to remove \b after .

Output:a rMr. Nobody Mister Nobody is Mister Nobody and Mra Nobody.'

The reason why \bMr(\.)?\b is not working because between . and space there is no word boundary.

There are three different positions that qualify as word boundaries:

Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.

answered Sep 30 '22 05:09

vks

Related questions
                            
                                Django - authentication, registration with email confirmation
                            
                                How to display "x days ago" type time using Humanize in Django template?
                            
                                Notepad++ Tab Settings [closed]
                            
                                Iterate through checkboxes in Flask
                            
                                Python regex match literal asterisk
                            
                                matplotlib centered bar chart with dates
                            
                                How to make a redirect and keep the query string?
                            
                                Can I assign values in RowProxy using the sqlalchemy?
                            
                                Flask-SQLAlchemy: How to conditionally insert or update a row
                            
                                Sort dict in jinja2 loop
                            
                                Python descriptor vs property [duplicate]
                            
                                Python: Finding multiple roots of nonlinear equation
                            
                                Can you suggest a good minhash implementation?
                            
                                Display JSON returned from Flask in a neat way
                            
                                Divide string by line break or period with Python regular expressions
                            
                                NLTK for Persian
                            
                                Pandas groupby: get size of a group knowing its id (from .grouper.group_info[0])
                            
                                Is there a python equivalent to Laravel 4?
                            
                                Running coverage inside virtualenv
                            
                                Selecting Data between Specific hours in a pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With