Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are non-word boundary in regex (\B), compared to word-boundary?

What are non-word boundary in regex (\B), compared to word-boundary?

like image 916
DarkLightA Avatar asked Dec 27 '10 20:12

DarkLightA


People also ask

What is word boundary \B?

A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.

What is a not word boundary?

A non-word boundary matches any place else: between any pair of characters, both of which are word characters or both of which are not word characters. at the beginning of a string if the first character is a non-word character. at the end of a string if the last character is a non-word character.

What does word boundary mean in regex?

A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ( [0-9A-Za-z_] ). So, in the string "-12" , it would match before the 1 or after the 2. The dash is not a word character.

Which sequence is useful to indicate word boundary in regex?

The following three positions are qualified as word boundaries: Before the first character in a string if the first character is a word character. After the last character in a string if the last character is a word character. Between two characters in a string if one is a word character and the other is not.


1 Answers

A word boundary (\b) is a zero width match that can match:

  • Between a word character (\w) and a non-word character (\W) or
  • Between a word character and the start or end of the string.

In Javascript the definition of \w is [A-Za-z0-9_] and \W is anything else.

The negated version of \b, written \B, is a zero width match where the above does not hold. Therefore it can match:

  • Between two word characters.
  • Between two non-word characters.
  • Between a non-word character and the start or end of the string.
  • The empty string.

For example if the string is "Hello, world!" then \b matches in the following places:

 H e l l o ,   w o r l d ! ^         ^   ^         ^  

And \B matches those places where \b doesn't match:

 H e l l o ,   w o r l d !   ^ ^ ^ ^   ^   ^ ^ ^ ^   ^ 
like image 183
Mark Byers Avatar answered Sep 20 '22 13:09

Mark Byers