Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep with regexp: whitespace doesn't match unless I add an assertion

GNU grep 2.5.4 on bash 4.1.5(1) on Ubuntu 10.04

This matches

$ echo "this is a     line" | grep 'a[[:space:]]\+line'
this is a     line

But this doesn't

$ echo "this is a     line" | grep 'a\s\+line'

But this matches too

$ echo "this is a     line" | grep 'a\s\+\bline'
this is a     line

I don't understand why #2 does not match (whereas # 1 does) and #3 also shows a match. Whats the difference here?

like image 633
Ankur Agarwal Avatar asked Aug 10 '11 15:08

Ankur Agarwal


2 Answers

Take a look at your grep manpage. Perl added a lot of regular expression extensions that weren't in the original specification. However, because they proved so useful, many programs adopted them.

Unfortunately, grep is sometimes stuck in the past because you want to make sure your grep command remains compatible with older versions of grep.

Some systems have egrep with some extensions. Others allow you to use grep -E to get them. Still others have a grep -P that allows you to use Perl extensions. I believe Linux systems' grep command can use the -P extension which is not available in most Unix systems unless someone has replaced the grep with the GNU version. Newer versions of Mac OS X also support the -P switch, but not older versions.

like image 98
David W. Avatar answered Nov 03 '22 11:11

David W.


grep doesn't support the complete set of regular expressions, so try using -P to enable perl regular expressions. You don't need to escape the + i.e.

echo "this is a     line" | grep -P 'a\s+line' 
like image 23
dogbane Avatar answered Nov 03 '22 12:11

dogbane