This reg exp search correctly checks to see if a string contains the text harry:
re.search(r'\bharry\b', '[harry] blah', re.IGNORECASE)
However, I need to ensure that the string contains [harry]. I have tried escaping with various numbers of back-slashes:
re.search(r'\b\[harry\]\b', '[harry] blah', re.IGNORECASE)
re.search(r'\b\\[harry\\]\b', '[harry] blah', re.IGNORECASE)
re.search(r'\b\\\[harry\\\]\b', '[harry] blah', re.IGNORECASE)
None of these solutions work find the match. What do I need to do?
The first one is correct:
r'\b\[harry\]\b'
But this won’t match [harry] blah as [ is not a word character and so there is no word boundary. It would only match if there were a word character in front of [ like in foobar[harry] blah.
>>> re.search(r'\bharry\b','[harry] blah',re.IGNORECASE)
<_sre.SRE_Match object at 0x7f14d22df648>
>>> re.search(r'\b\[harry\]\b','[harry] blah',re.IGNORECASE)
>>> re.search(r'\[harry\]','[harry] blah',re.IGNORECASE)
<_sre.SRE_Match object at 0x7f14d22df6b0>
>>> re.search(r'\[harry\]','harry blah',re.IGNORECASE)
The problem is the \b, not the brackets. A single backslash is correct for escaping.
You escape it the way you escape most regex metacharacter: preceding with a backslash.
Thus, r"\[harry\]" will match a literal string [harry].
The problem is with the \b in your pattern. This is the word boundary anchor.
The \b matches:
\w and a non-word character \W (note the case difference)The brackets [ and ] are NOT word characters, thus if a string starts with [, there is no \b to its left. Any where there is no \b, there is \B instead (note the case difference).
\b: Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that\bis defined as the boundary between\wand\W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range,\brepresents the backspace character, for compatibility with Python’s string literals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With