Regexp Backslash - GNU Emacs Manual says that <code>\<</code> matches at the beginning of a word, <code>\></code> matches at the end of a word, and <code>\b</code> matches a word boundary. <code>\b</code> is just as in other non-Emacs regular expressions. But it seems that <code>\<</code> and <code>\></code> are particular to Emacs regular expressions. Are there cases where <code>\<</code> and <code>\></code> are needed instead of <code>\b</code>? For instance, <code>\bword\b</code> would match the same as <code>\<word\></code> would, and the only difference is that the latter is more readable.

You can get unexpected results if you assume they behave the same.. What can \< and > that \b can do? The answer is that <code>\<</code> and<code>\></code> are explicit... This end of a word! and only this end! <code>\b</code>is general.... Either end of a word will match... GNU Operators * Word Operators <pre class="prettyprint"><code>line="cat dog sky" echo "$line" |sed -n "s/$.*$\b$.*$/# |\1|\2|/p" echo "$line" |sed -n "s/$.*$\>$.*$/# |\1|\2|/p" echo "$line" |sed -n "s/$.*$\<$.*$/# |\1|\2|/p" echo line="cat dog sky" echo "$line" |sed -n "s/$.*$\b$.*$/# |\1|\2|/p" echo "$line" |sed -n "s/$.*$\>$.*$/# |\1|\2|/p" echo "$line" |sed -n "s/$.*$\<$.*$/# |\1|\2|/p" echo line="cat dog sky " echo "$line" |sed -n "s/$.*$\b$.*$/# |\1|\2|/p" echo "$line" |sed -n "s/$.*$\>$.*$/# |\1|\2|/p" echo "$line" |sed -n "s/$.*$\<$.*$/# |\1|\2|/p" echo </code></pre> output <pre class="prettyprint"><code># |cat dog |sky| # |cat dog| sky| # |cat dog |sky| # |cat dog |sky| # |cat dog| sky| # |cat dog |sky| # |cat dog sky| | # |cat dog sky| | # |cat dog |sky | </code></pre>

Emacs regular expression: what \< and \> can do that \b cannot do?

Tags:

regex

emacs

word

Regexp Backslash - GNU Emacs Manual says that \< matches at the beginning of a word, \> matches at the end of a word, and \b matches a word boundary. \b is just as in other non-Emacs regular expressions. But it seems that \< and \> are particular to Emacs regular expressions. Are there cases where \< and \> are needed instead of \b? For instance, \bword\b would match the same as \<word\> would, and the only difference is that the latter is more readable.

262

asked Apr 30 '11 19:04

Yoo

2 Answers

You can get unexpected results if you assume they behave the same..
What can \< and > that \b can do?
The answer is that \< and\> are explicit... This end of a word! and only this end!
\bis general.... Either end of a word will match...

GNU Operators * Word Operators

line="cat dog sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky  "  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo

output

# |cat dog |sky|
# |cat dog| sky|
# |cat dog |sky|

# |cat  dog  |sky|
# |cat  dog|  sky|
# |cat  dog  |sky|

# |cat  dog  sky|  |
# |cat  dog  sky|  |
# |cat  dog  |sky  |

109

answered Nov 16 '22 00:11

Peter.O

It looks to me like \<.*?\> would match only series of word characters, while \b.*?\b would match either series of word characters or a series non-word characters, since it can also accept the end of a word, and then the beginning of one. If you force the expression between the two to be a word, they do indeed act the same.

Of course, you could replicate the behavior of \< and \> with \b\w and \w\b. So I guess the answer is that yes, it's mostly for readability. Then again, isn't that what most escape characters in regular expression are for?

answered Nov 16 '22 02:11

dlras2

Related questions
                            
                                JavaScript RegExp to automatically format Pattern
                            
                                Unescaped left brace regex error
                            
                                Alternative to possessive quantifier in python
                            
                                Why is an underscore (_) not regarded as a non-word character?
                            
                                Ways to prevent SQL Injection Attack & XSS in Java Web Application
                            
                                How do I delete a matching line, the line above and the one below it, using sed?
                            
                                Intersection of two regular expressions
                            
                                Why does the =~ operator only sometimes have side effects?
                            
                                Javascript Regex- replace sequence of characters with same number of another character
                            
                                How to grab IP:PORT with regex?
                            
                                Defining a JavaScript regular expression that matches anything except a particular string
                            
                                Validate Canadian Postal Code Regex
                            
                                What is the difference between =~ and match() when pattern matching?
                            
                                javascript unterminated character class [duplicate]
                            
                                What's the maximum number of repetitions allowed in a Python regex?
                            
                                Split keep repeated delimiter
                            
                                How to detect sentence boundaries with OpenNLP and stringi?
                            
                                Multiple matches with Postgres regexp_matches
                            
                                How to split a string by uppercase and lowercase in JavaScript?
                            
                                In C/C++ mode in Emacs, change face of code in #if 0...#endif block to comment face

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With