This reg exp search correctly checks to see if a string contains the text harry:
re.search(r'\bharry\b', '[harry] blah', re.IGNORECASE)
However, I need to ensure that the string contains [harry]. I have tried escaping with various numbers of back-slashes:
re.search(r'\b\[harry\]\b', '[harry] blah', re.IGNORECASE)
re.search(r'\b\\[harry\\]\b', '[harry] blah', re.IGNORECASE)
re.search(r'\b\\\[harry\\\]\b', '[harry] blah', re.IGNORECASE)
None of these solutions work find the match. What do I need to do?
The first one is correct:
r'\b\[harry\]\b'
But this won’t match [harry] blah
as [
is not a word character and so there is no word boundary. It would only match if there were a word character in front of [
like in foobar[harry] blah
.
>>> re.search(r'\bharry\b','[harry] blah',re.IGNORECASE)
<_sre.SRE_Match object at 0x7f14d22df648>
>>> re.search(r'\b\[harry\]\b','[harry] blah',re.IGNORECASE)
>>> re.search(r'\[harry\]','[harry] blah',re.IGNORECASE)
<_sre.SRE_Match object at 0x7f14d22df6b0>
>>> re.search(r'\[harry\]','harry blah',re.IGNORECASE)
The problem is the \b
, not the brackets. A single backslash is correct for escaping.
You escape it the way you escape most regex metacharacter: preceding with a backslash.
Thus, r"\[harry\]"
will match a literal string [harry]
.
The problem is with the \b
in your pattern. This is the word boundary anchor.
The \b
matches:
\w
and a non-word character \W
(note the case difference)The brackets [
and ]
are NOT word characters, thus if a string starts with [
, there is no \b
to its left. Any where there is no \b
, there is \B
instead (note the case difference).
\b
: Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that\b
is defined as the boundary between\w
and\W
, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range,\b
represents the backspace character, for compatibility with Python’s string literals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With