Difference between \w and \b regular expression meta characters

Tags:

regex

People also ask

What is the difference between \b and \b in regular expression?

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

What does \b mean in regex?

\b is a zero width assertion. That means it does not match a character, it matches a position with one thing on the left side and another thing on the right side. The word boundary \b matches on a change from a \w (a word character) to a \W a non word character, or from \W to \w.

What characters are in \W regex?

\w (word character) matches any single letter, number or underscore (same as [a-zA-Z0-9_] ). The uppercase counterpart \W (non-word-character) matches any single character that doesn't match by \w (same as [^a-zA-Z0-9_] ). In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart.

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.

There are three different positions that qualify as word boundaries:

Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.

Simply put: \b allows you to perform a "whole words only" search using a regular expression in the form of \bword\b. A "word character" is a character that can be used to form words. All characters that are not "word characters" are "non-word characters".

In all flavors, the characters [a-zA-Z0-9_] are word characters. These are also matched by the short-hand character class \w. Flavors showing "ascii" for word boundaries in the flavor comparison recognize only these as word characters.

\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

\W is short for [^\w], the negated version of \w.

\w matches a word character. \b is a zero-width match that matches a position character that has a word character on one side, and something that's not a word character on the other. (Examples of things that aren't word characters include whitespace, beginning and end of the string, etc.)

\w matches a, b, c, d, e, and f in "abc def"
\b matches the (zero-width) position before a, after c, before d, and after f in "abc def"

See: http://www.regular-expressions.info/reference.html/

@Mahender, you probably meant the difference between \W (instead of \w) and \b. If not, then I would agree with @BoltClock and @jwismar above. Otherwise continue reading.

\W would match any non-word character and so its easy to try to use it to match word boundaries. The problem is that it will not match the start or end of a line. \b is more suited for matching word boundaries as it will also match the start or end of a line. Roughly speaking (more experienced users can correct me here) \b can be thought of as (\W|^|$). [Edit: as @Ωmega mentions below, \b is a zero-length match so (\W|^|$) is not strictly correct, but hopefully helps explain the diff]

Quick example: For the string Hello World, .+\W would match Hello_ (with the space) but will not match World. .+\b would match both Hello and World.

\b <= this is a word boundary.

Matches at a position that is followed by a word character but not preceded by a word character, or that is preceded by a word character but not followed by a word character.

\w <= stands for "word character".

It always matches the ASCII characters [A-Za-z0-9_]

Is there anything specific you are trying to match?

Some useful regex websites for beginners or just to wet your appetite.

http://www.regular-expressions.info
http://www.javascriptkit.com/javatutors/redev2.shtml
http://www.virtuosimedia.com/dev/php/37-tested-php-perl-and-javascript-regular-expressions
http://www.i-programmer.info/programming/javascript/4862-master-javascript-regular-expressions.html

I found this to be a very useful book:

Mastering Regular Expressions by Jeffrey E.F. Friedl

\w is not a word boundary, it matches any word character, including underscores: [a-zA-Z0-9_]. \b is a word boundary, that is, it matches the position between a word and a non-alphanumeric character: \W or [^\w].

These implementations may vary from language to language though.

Related questions
                            
                                What is a regular expression for a MAC Address?
                            
                                Regular expression to return text between parenthesis
                            
                                Sublime Text regex not detecting multiline tags
                            
                                Python regular expressions return true/false
                            
                                Split string based on regex
                            
                                Remove all special characters from a string in R?
                            
                                Python Regex instantly replace groups
                            
                                Replace first occurrence of string in Python
                            
                                Split a string on whitespace in Go?
                            
                                Create RegExps on the fly using string variables
                            
                                Regex for string contains?
                            
                                How to correctly sort a string with a number inside? [duplicate]
                            
                                Regular expression \p{L} and \p{N}
                            
                                What Regex would capture everything from ' mark to the end of a line?
                            
                                Using Java to find substring of a bigger string using Regular Expression
                            
                                grep using a character vector with multiple patterns
                            
                                Java regex email
                            
                                PHP Regex to get youtube video ID?
                            
                                javascript regex - look behind alternative?
                            
                                Convert PHP closing tag into comment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between \w and \b regular expression meta characters

Tags:

regex

People also ask

Related questions

Recent Activity

Donate For Us