Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep for a string that ends with specific character

Tags:

regex

grep

bash

Is there a way to use extended regular expressions to find a specific pattern that ends with a string.

I mean, I want to match first 3 lines but not the last:

file_number_one.pdf # comment
file_number_two.pdf # not interesting
testfile_number____three.pdf # some other stuff
myfilezipped.pdf.zip some comments and explanations

I know that in grep, metacharacter $ matches the end of a line but I'm not interested in matching a line end but string end. Groups in grep are very odd, I don't understand them well yet.

I tried with group matching, actually I have a similar REGEX but it does not work with grep -E

(\w+).pdf$

Is there a way to do string ending match in grep/egrep?

like image 309
shadox Avatar asked Oct 21 '14 22:10

shadox


2 Answers

Your example works with matching the space after the string also:

grep -E '\.pdf ' input.txt

What you call "string" is similar to what grep calls "word". A Word is a run of alphanumeric characters. The nice thing with words is that you can match a word end with the special \>, which matches a word end with a march of zero characters length. That also matches at the end of line. But the word characters can not be changed, and do not contain punctuation, so we can not use it.

If you need to match at the end of line too, where there is no space after the word, use:

grep -E '\.pdf |\.pdf$' input.txt

To include cases where the character after the file name is not a space character '', but other whitespace, like a tab, \t, or the name is directly followed by a comment, starting with #, use:

grep -E '\.pdf[[:space:]#]|\.pdf$' input.txt

I will illustrate the matching of word boundarys too, because that would be the perfect solution, except that we can not use it here because we can not change the set of characters that are seen as parts of a word.

The input contains foo as separate word, and as part of longer words, where the foo is not at the end of the word, and therefore not at a word boundary:

$ printf 'foo bar\nfoo.bar\nfoobar\nfoo_bar\nfoo\n'
foo bar
foo.bar
foobar
foo_bar
foo

Now, to match the boundaries of words, we can use \< for the beginning, and \> to match the end:

$ printf 'foo bar\nfoo.bar\nfoobar\nfoo_bar\nfoo\n' | grep 'foo\>'
foo bar
foo.bar
foo

Note how _ is matched as a word char - but otherwise, wordchars are only the alphanumerics, [a-zA-Z0-9].
Also note how foo an the end of line is matched - in the line containing only foo. We do not need a special case for the end of line.

like image 178
Volker Siegel Avatar answered Sep 27 '22 16:09

Volker Siegel


You can use \> operator

grep 'word\>' fileName
like image 30
Jasur Shukurov Avatar answered Sep 27 '22 16:09

Jasur Shukurov