I simplified my code to the specific problem I am having.
import re
pattern = re.compile(r'\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")
I am getting
'-match- match'
but I want
'-word- match'
edit:
Or for the string "word -word-"
I want
"match -word-"
The following three positions are qualified as word boundaries: Before the first character in a string if the first character is a word character. After the last character in a string if the last character is a word character. Between two characters in a string if one is a word character and the other is not.
A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”.
What you need is a negative lookbehind.
pattern = re.compile(r'(?<!-)\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")
To cite the documentation:
(?<!...)
Matches if the current position in the string is not preceded by a match for ....
So this will only match, if the word-break \b
is not preceded with a minus sign -
.
If you need this for the end of the string you'll have to use a negative lookahead which will look like this: (?!-)
. The complete regular expression will then result in: (?<!-)\bword(?!-)\b
\b
basically denotes a word boundary on characters other than [a-zA-Z0-9_]
which includes spaces as well. Surround word
with negative lookarounds to ensure there is no non-space character after and before it:
re.compile(r'(?<!\S)word(?!\S)')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With