Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beginning and end of words in sed and grep

Tags:

regex

sed

I don't understand the difference between \b and \< in GNU sed and GNU grep. It seems to me \b can always replace \< and \\> without changing the set of matching strings.

More specifically, I am trying to find examples in which \bsomething and \\< something do not match exactly the same strings.

Same question for something\b and something\\>.

Thank you

like image 281
anilomjf Avatar asked Jun 29 '13 16:06

anilomjf


People also ask

Can we use grep and sed together?

This example is just a possible usage of grep . Again, as this is a relatively small file, you can get what you want using what's shown above. The -v switch reverses the search criteria, meaning that grep searches the file sed-grep. txt and prints out all of the details, excluding the <search-pattern> (10.1.

What does sed '/ $/ D do?

This deletes leading empty lines, not blank lines. To delete leading blank lines (lines which are empty or contain only whitespace characters) say '/\S/,$! d' .

What is the difference between sed and grep?

The sed command is a stream editor that works on streams of characters. It's a more powerful tool than grep as it offers more options for text processing purposes, including the substitute command, which sed is most commonly known for.

How do you grep all the lines between two patterns?

The basic grep syntax when searching multiple patterns in a file includes using the grep command followed by strings and the name of the file or its path. The patterns need to be enclosed using single quotes and separated by the pipe symbol. Use the backslash before pipe | for regular expressions.


1 Answers

I suspect that it very rarely makes a difference whether you use (the more common) \b or (the more specific) \< and \>, but I can think of an example where it would. This is quite contrived, and I suspect that in most real-world regex use it wouldn't make a difference, but this should demonstrate that it at least could make a difference in some cases.

If I have the following text:

this is his pig

and I want to know if /\bis\b/ matches, it wouldn't matter if I instead used /\<is\>/ or I instead used /\>is\</

But what if my text was instead

is this his pig

There's no longer a word-final boundary before the 'is', only a word-initial boundary. Using /\bis\b/ matches, and of course /\<is\>/ does too, but /\>is\</ does not.

In real life, though, I think it is not common that you really need to be able to make this distinction, which is why (at least outside of sed) \b is the normal word boundary marker for regular expressions.

like image 179
iconoclast Avatar answered Sep 28 '22 02:09

iconoclast