I don't understand the difference between <code>\b</code> and <code>\<</code> in GNU sed and GNU grep. It seems to me <code>\b</code> can always replace <code>\<</code> and <code>\\></code> without changing the set of matching strings. More specifically, I am trying to find examples in which <code>\bsomething</code> and <code>\\< something</code> do not match exactly the same strings. Same question for <code>something\b</code> and <code>something\\></code>. Thank you

I suspect that it very rarely makes a difference whether you use (the more common) <code>\b</code> or (the more specific) <code>\<</code> and <code>\></code>, but I can think of an example where it would. This is quite contrived, and I suspect that in most real-world regex use it wouldn't make a difference, but this should demonstrate that it at least could make a difference in some cases. If I have the following text: <pre class="prettyprint"><code>this is his pig </code></pre> and I want to know if <code>/\bis\b/</code> matches, it wouldn't matter if I instead used <code>/\<is\>/</code> or I instead used <code>/\>is\</</code> But what if my text was instead <pre class="prettyprint"><code>is this his pig </code></pre> There's no longer a word-final boundary before the 'is', only a word-initial boundary. Using <code>/\bis\b/</code> matches, and of course <code>/\<is\>/</code> does too, but <code>/\>is\</</code> does not. In real life, though, I think it is not common that you really need to be able to make this distinction, which is why (at least outside of sed) <code>\b</code> is the normal word boundary marker for regular expressions.

Beginning and end of words in sed and grep

1 Answers

I suspect that it very rarely makes a difference whether you use (the more common) \b or (the more specific) \< and \>, but I can think of an example where it would. This is quite contrived, and I suspect that in most real-world regex use it wouldn't make a difference, but this should demonstrate that it at least could make a difference in some cases.

If I have the following text:

this is his pig

and I want to know if /\bis\b/ matches, it wouldn't matter if I instead used /\<is\>/ or I instead used /\>is\</

But what if my text was instead

is this his pig

There's no longer a word-final boundary before the 'is', only a word-initial boundary. Using /\bis\b/ matches, and of course /\<is\>/ does too, but /\>is\</ does not.

In real life, though, I think it is not common that you really need to be able to make this distinction, which is why (at least outside of sed) \b is the normal word boundary marker for regular expressions.

179

answered Sep 28 '22 02:09

iconoclast

Related questions
                            
                                How to do case insensitive match with regexmatch in google sheets?
                            
                                Fastest way to perform a lot of strings replace in Java
                            
                                Unix Flex Regex for Multi-Line Comments
                            
                                Extending regular expression syntax to say 'does not contain text XYZ'
                            
                                Are Regular Expressions universal? [closed]
                            
                                How can I grab the entire content inside `<body>` tag with regex?
                            
                                Count number of matches
                            
                                why dot inside square brackets doesn't match any character?
                            
                                PHP regex match all urls [duplicate]
                            
                                Regex for a file name without an extension
                            
                                Rails Regex warning: character class has '-' without escape
                            
                                using sed to insert file content into a file BEFORE a pattern
                            
                                R split string at last whitespace chars using tidyr::separate
                            
                                Sed error: bad flag in substitute command: 'U'
                            
                                check if a string ends with a name in CMake
                            
                                Save part of matching pattern to variable
                            
                                Filter out numbers out of a text using regular expressions in javascript
                            
                                Using regular expressions (regex) to replace selected text in jQuery / JavaScript
                            
                                Extract dollar amount from string - regex in PHP
                            
                                Replace string with part of the matching regex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Beginning and end of words in sed and grep

Tags:

regex

sed

anilomjf

People also ask

1 Answers

iconoclast

Recent Activity

Donate For Us