Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Emacs regular expression: what \< and \> can do that \b cannot do?

Tags:

regex

emacs

word

Regexp Backslash - GNU Emacs Manual says that \< matches at the beginning of a word, \> matches at the end of a word, and \b matches a word boundary. \b is just as in other non-Emacs regular expressions. But it seems that \< and \> are particular to Emacs regular expressions. Are there cases where \< and \> are needed instead of \b? For instance, \bword\b would match the same as \<word\> would, and the only difference is that the latter is more readable.

like image 262
Yoo Avatar asked Apr 30 '11 19:04

Yoo


People also ask

What the \b will do in a regular expression?

Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.

What does \\ mean in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What is non word boundary in regex?

The basic purpose of non-word-boundary is to created a regex that says: if we are at the beginning/end of a word char ( \w = [a-zA-Z0-9_] ) make sure the previous/next character is also a word char , e.g.: "a\B." ~ "a\w" : "ab" , "a4" , "a_" , ... but not "a " , "a."

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.


2 Answers

You can get unexpected results if you assume they behave the same..
What can \< and > that \b can do?
The answer is that \< and\> are explicit... This end of a word! and only this end!
\bis general.... Either end of a word will match...

GNU Operators * Word Operators

line="cat dog sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky  "  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo

output

# |cat dog |sky|
# |cat dog| sky|
# |cat dog |sky|

# |cat  dog  |sky|
# |cat  dog|  sky|
# |cat  dog  |sky|

# |cat  dog  sky|  |
# |cat  dog  sky|  |
# |cat  dog  |sky  |
like image 109
Peter.O Avatar answered Nov 16 '22 00:11

Peter.O


It looks to me like \<.*?\> would match only series of word characters, while \b.*?\b would match either series of word characters or a series non-word characters, since it can also accept the end of a word, and then the beginning of one. If you force the expression between the two to be a word, they do indeed act the same.

Of course, you could replicate the behavior of \< and \> with \b\w and \w\b. So I guess the answer is that yes, it's mostly for readability. Then again, isn't that what most escape characters in regular expression are for?

like image 22
dlras2 Avatar answered Nov 16 '22 02:11

dlras2