Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does \b work when using regular expressions?

Tags:

If I have a sentence and I wish to display a word or all words after a particular word has been matched ahead of it, for example I would like to display the word fox after brown The quick brown fox jumps over the lazy dog, I know I can look positive look behinds e.g. (?<=brown\s)(\w+) however I don't quite understand the use of \b in the instance (?<=\bbrown\s)(\w+). I am using http://gskinner.com/RegExr/ as my tester.

like image 871
PeanutsMonkey Avatar asked Sep 30 '11 01:09

PeanutsMonkey


People also ask

What does \b mean in regex python?

PythonServer Side ProgrammingProgramming. The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore) \B matches all positions where \b doesn't match.

What does \b mean in regex Java?

In Java, "\b" is a back-space character (char 0x08 ), which when used in a regex will match a back-space literal.

What is \r and \n in regex?

\n. Matches a newline character. \r. Matches a carriage return character.


1 Answers

\b is a zero width assertion. That means it does not match a character, it matches a position with one thing on the left side and another thing on the right side.

The word boundary \b matches on a change from a \w (a word character) to a \W a non word character, or from \W to \w

Which characters are included in \w depends on your language. At least there are all ASCII letters, all ASCII numbers and the underscore. If your regex engine supports unicode, it could be that there are all letters and numbers in \w that have the unicode property letter or number.

\W are all characters, that are NOT in \w.

\bbrown\s 

will match here

The quick brown fox          ^^ 

but not here

The quick bbbbrown fox 

because between b and brown is no word boundary, i.e. no change from a non word character to a word character, both characters are included in \w.

If your regex comes to a \b it goes on to the next char, thats the b from brown. Now the \b know's whats on the right side, a word char ==> the b. But now it needs to look back, to let the \b become TRUE, there needs to be a non word character before the b. If there is a space (thats not in \w) then the \b before the b is true. BUT if there is another b then its false and then \bbrown does not match "bbrown"

The regex brown would match both strings "quick brown" and "bbrown", where the regex \bbrown matches only "quick brown" AND NOT "bbrown"

For more details see here on www.regular-expressions.info

like image 181
stema Avatar answered Oct 11 '22 14:10

stema