Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there another regular-expression "flavor" in GNU sed?

Tags:

regex

bash

sed

I love sed but I hate how many backslashes are needed in its regular expressions. For example, here is a sed command that will take the first 8 words out of each line of input:

sed -n 's/^\(\S\+\s\+\)\{8\}\(.*\)/\2/p'

Ugly.

Almost every character has a backslash preceding it. It would be much nicer if sed would assume that special characters were special by default.

Here is how I would like the expression to look:

s/^(\S+\s){8}(.*)/\2/p

Is there a way to achieve this?

like image 714
hololeap Avatar asked Feb 09 '15 01:02

hololeap


1 Answers

Switch to ERE in sed

As Avinash Raj has pointed out, sed uses basic regular expression (BRE) syntax by default, (which requires (, ), {, } to be preceded by \ to activate its special meaning), and -r option switches over to extended regular expression (ERE) syntax, which treats (, ), {, } as special without preceding \.

POSIX standard

Except for these escape sequences:

\^    \.    \[    \$    \(    \)    \|
\*    \+    \?    \{    \\

the POSIX standard explicitly leaves the behavior undefined for other escape sequences in ERE.

An ordinary character is an ERE that matches itself. An ordinary character is any character in the supported character set, except for the ERE special characters listed in ERE Special Characters. The interpretation of an ordinary character preceded by a backslash ( '\' ) is undefined.

Since the behavior is undefined, implementations are free to provide extensions to the syntax.

GNU extensions to escape sequences

As rici has noted in the comment, \s and \S are GNU extensions. GNU implementation also provides the following extensions for regular expression and replacement string syntax (for both BRE and ERE):

\a \f \n \r \t \v
\cX
\dXXX
\oXXX
\xXX

and the following extensions for use in regular expression only:

\w \W
\b \B
\'
\`

Plus these undocumented/under-documented extensions:

\s \S
\< \>

If the code never runs on non-GNU implementation of sed, your current code is acceptable.

like image 136
2 revs Avatar answered Sep 20 '22 23:09

2 revs