Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between \b and \s in Regular Expression

I was learning regular expression in iOS, saw this tutorial:http://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet

It reads like this for \b:

\b matches word boundary characters such as spaces and punctuation. to\b will match the "to" in "to the moon" and "to!", but it will not match "tomorrow". \b is handy for "whole word" type matching.

and \s:

\s matches whitespace characters such as spaces, tabs, and newlines. hello\s will match "hello " in "Well, hello there!".

I have two questions on this:

1) what is the difference between \s and \b? when to use which?

2) \b is handy for "whole word" type matching -> Don't understand the meaning..

Need some guidance on these two.

like image 474
lakshmen Avatar asked Jun 10 '13 08:06

lakshmen


People also ask

What is the difference between \b and \b in regular expression?

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

What does \b mean in regex?

With some variations depending on the engine, regex usually defines a word character as a letter, digit or underscore. A word boundary \bdetects a position where one side is such a character, and the other is not.

What is the difference between \S and \s in regex?

The Difference Between \s and \s+ For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.

What does \s mean in regex?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.


1 Answers

\b Boundary characters

\b matches the boundary itself but not the boundary character (like a comma or period). It has no length in itself but can be used to find for example e in the end of a word.

For example in the sentence: "Hello there, this is one test. Testing"

The regex e\b will match an e if it's at the end of the word (followed by a word boundary). Notice in the image below that the e in "test" and "Testing" didn't match since the "e" is not followed by a boundary.

enter image description here

\s Whitespace

\s on the other hand matches the actual white space characters (like spaces and tabs). In the same sentence it will match all the spaces between the words.

enter image description here


Edit

Since \b doesn't make much sense alone I showed to how to it as e\b (above). The OP asked (in a comment) about what e\s would match compared to e\b to better explain the difference between \b and \s.

In the same string there is only one match for e\s while there was two matches for e\b since the comma is not a whitespace. Note that the e\s match (image 3) includes the white space where as the e\b match doesn't (image 1).

enter image description here

like image 171
David Rönnqvist Avatar answered Oct 01 '22 06:10

David Rönnqvist