Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a word boundary in regex?

People also ask

What is a word boundary character?

A word boundary is a zero-width test between two characters. To pass the test, there must be a word character on one side, and a non-word character on the other side. It does not matter which side each character appears on, but there must be one of each.

What is word boundary in regex Java?

The regular expression token "\b" is called a word boundary. It matches at the start or the end of a word. By itself, it results in a zero-length match.

What are word boundaries examples?

For example, the / three / little / pigs / went / to / market. . . . Indivisibility: Say a sentence out loud, and ask someone to 'add extra words' to it. The extra item will be added between the words and not within them.

What is word boundary \B?

A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.


A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]).

So, in the string "-12", it would match before the 1 or after the 2. The dash is not a word character.


In the course of learning regular expression, I was really stuck in the metacharacter which is \b. I indeed didn't comprehend its meaning while I was asking myself "what it is, what it is" repetitively. After some attempts by using the website, I watch out the pink vertical dashes at the every beginning of words and at the end of words. I got it its meaning well at that time. It's now exactly word(\w)-boundary.

My view is merely to immensely understanding-oriented. Logic behind of it should be examined from another answers.

enter image description here


A word boundary can occur in one of three positions:

  1. Before the first character in the string, if the first character is a word character.
  2. After the last character in the string, if the last character is a word character.
  3. Between two characters in the string, where one is a word character and the other is not a word character.

Word characters are alpha-numeric; a minus sign is not. Taken from Regex Tutorial.


A word boundary is a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one.


I would like to explain Alan Moore's answer

A word boundary is a position that is either preceded by a word character and not followed by one or followed by a word character and not preceded by one.

Suppose I have a string "This is a cat, and she's awesome", and I am supposed to replace all occurrence(s) the letter 'a' only if this letter exists at the "Boundary of a word", i.e. the letter a inside 'cat' should not be replaced.

So I'll perform regex (in Python) as

re.sub(r"\ba","e", myString.strip()) //replace a with e

so the output will be

This is e cat end she's ewesome


I talk about what \b-style regex boundaries actually are here.

The short story is that they’re conditional. Their behavior depends on what they’re next to.

# same as using a \b before:
(?(?=\w) (?<!\w)  | (?<!\W) )

# same as using a \b after:
(?(?<=\w) (?!\w)  | (?!\W)  )

Sometimes that isn’t what you want. See my other answer for elaboration.