A word boundary is a zero-width test between two characters. To pass the test, there must be a word character on one side, and a non-word character on the other side. It does not matter which side each character appears on, but there must be one of each.
The regular expression token "\b" is called a word boundary. It matches at the start or the end of a word. By itself, it results in a zero-length match.
For example, the / three / little / pigs / went / to / market. . . . Indivisibility: Say a sentence out loud, and ask someone to 'add extra words' to it. The extra item will be added between the words and not within them.
A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.
A word boundary, in most regex dialects, is a position between \w
and \W
(non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]
).
So, in the string "-12"
, it would match before the 1 or after the 2. The dash is not a word character.
In the course of learning regular expression, I was really stuck in the metacharacter which is \b
. I indeed didn't comprehend its meaning while I was asking myself "what it is, what it is" repetitively. After some attempts by using the website, I watch out the pink vertical dashes at the every beginning of words and at the end of words. I got it its meaning well at that time. It's now exactly word(\w
)-boundary.
My view is merely to immensely understanding-oriented. Logic behind of it should be examined from another answers.
A word boundary can occur in one of three positions:
Word characters are alpha-numeric; a minus sign is not. Taken from Regex Tutorial.
A word boundary is a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one.
I would like to explain Alan Moore's answer
A word boundary is a position that is either preceded by a word character and not followed by one or followed by a word character and not preceded by one.
Suppose I have a string "This is a cat, and she's awesome", and I am supposed to replace all occurrence(s) the letter 'a' only if this letter exists at the "Boundary of a word", i.e. the letter a
inside 'cat' should not be replaced.
So I'll perform regex (in Python) as
re.sub(r"\ba","e", myString.strip())
//replace a
with e
so the output will be
This is e cat end she's ewesome
I talk about what \b
-style regex boundaries actually are here.
The short story is that they’re conditional. Their behavior depends on what they’re next to.
# same as using a \b before:
(?(?=\w) (?<!\w) | (?<!\W) )
# same as using a \b after:
(?(?<=\w) (?!\w) | (?!\W) )
Sometimes that isn’t what you want. See my other answer for elaboration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With