I simplified my code to the specific problem I am having. <pre class="prettyprint"><code>import re pattern = re.compile(r'\bword\b') result = pattern.sub(lambda x: "match", "-word- word") </code></pre> I am getting <pre class="prettyprint"><code>'-match- match' </code></pre> but I want <pre class="prettyprint"><code>'-word- match' </code></pre> edit: Or for the string <code>"word -word-"</code> I want <pre class="prettyprint"><code>"match -word-" </code></pre>

What you need is a negative lookbehind. <pre class="prettyprint"><code>pattern = re.compile(r'(?<!-)\bword\b') result = pattern.sub(lambda x: "match", "-word- word") </code></pre> To cite the documentation: <blockquote> <code>(?<!...)</code> Matches if the current position in the string is not preceded by a match for .... </blockquote> So this will only match, if the word-break <code>\b</code> is not preceded with a minus sign <code>-</code>. If you need this for the end of the string you'll have to use a negative lookahead which will look like this: <code>(?!-)</code>. The complete regular expression will then result in: <code>(?<!-)\bword(?!-)\b</code>

<code>\b</code> basically denotes a word boundary on characters other than <code>[a-zA-Z0-9_]</code> which includes spaces as well. Surround <code>word</code> with negative lookarounds to ensure there is no non-space character after and before it: <pre class="prettyprint"><code>re.compile(r'(?<!\S)word(?!\S)') </code></pre>

How to make word boundary \b not match on dashes

Tags:

python

regex

I simplified my code to the specific problem I am having.

import re
pattern = re.compile(r'\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")

I am getting

'-match- match'

but I want

'-word- match'

edit:

Or for the string "word -word-"

I want

"match -word-"

902

asked Sep 25 '16 08:09

alpalalpal

2 Answers

What you need is a negative lookbehind.

pattern = re.compile(r'(?<!-)\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")

To cite the documentation:

(?<!...) Matches if the current position in the string is not preceded by a match for ....

So this will only match, if the word-break \b is not preceded with a minus sign -.

If you need this for the end of the string you'll have to use a negative lookahead which will look like this: (?!-). The complete regular expression will then result in: (?<!-)\bword(?!-)\b

188

answered Oct 01 '22 10:10

Matthias

\b basically denotes a word boundary on characters other than [a-zA-Z0-9_] which includes spaces as well. Surround word with negative lookarounds to ensure there is no non-space character after and before it:

re.compile(r'(?<!\S)word(?!\S)')

answered Oct 01 '22 10:10

revo

Related questions
                            
                                Command help (via -h) where `argparse` is range checking input port number
                            
                                Custom xticks for multiple subplots?
                            
                                How can I list all packages/modules available to Python from within a Python script?
                            
                                How to rename DynamoDB column/key
                            
                                Why is Django returning stale cache data?
                            
                                How to remove unicode characters from Dictionary data in python
                            
                                Regular expression to separate out the last occurring number using Python
                            
                                Separating Django installed apps between Development vs Production
                            
                                How can I select data from a dask dataframe by a list of indices?
                            
                                initial centroids for scikit-learn kmeans clustering
                            
                                Access a Flask extension that is defined in the app factory
                            
                                Rename downloaded files selenium
                            
                                What is the difference between fit_transform and transform in sklearn countvectorizer?
                            
                                How can I git clone a repository with python, and get the progress of the clone process?
                            
                                Set color for NaN values in matplotlib
                            
                                Where can I get pycharm-debug.egg for Idea?
                            
                                global frame vs. stack frame
                            
                                Group By a Column and Sum contents of another column with Python
                            
                                Python 2.7.11 pip not installed
                            
                                visualization of convolutional layer in keras model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With