I love sed
but I hate how many backslashes are needed in its regular expressions. For example, here is a sed
command that will take the first 8 words out of each line of input:
sed -n 's/^\(\S\+\s\+\)\{8\}\(.*\)/\2/p'
Ugly.
Almost every character has a backslash preceding it. It would be much nicer if sed
would assume that special characters were special by default.
Here is how I would like the expression to look:
s/^(\S+\s){8}(.*)/\2/p
Is there a way to achieve this?
As Avinash Raj has pointed out, sed
uses basic regular expression (BRE) syntax by default, (which requires (
, )
, {
, }
to be preceded by \
to activate its special meaning), and -r
option switches over to extended regular expression (ERE) syntax, which treats (
, )
, {
, }
as special without preceding \
.
Except for these escape sequences:
\^ \. \[ \$ \( \) \|
\* \+ \? \{ \\
the POSIX standard explicitly leaves the behavior undefined for other escape sequences in ERE.
An ordinary character is an ERE that matches itself. An ordinary character is any character in the supported character set, except for the ERE special characters listed in ERE Special Characters. The interpretation of an ordinary character preceded by a backslash (
'\'
) is undefined.
Since the behavior is undefined, implementations are free to provide extensions to the syntax.
As rici has noted in the comment, \s
and \S
are GNU extensions. GNU implementation also provides the following extensions for regular expression and replacement string syntax (for both BRE and ERE):
\a \f \n \r \t \v
\cX
\dXXX
\oXXX
\xXX
and the following extensions for use in regular expression only:
\w \W
\b \B
\'
\`
Plus these undocumented/under-documented extensions:
\s \S
\< \>
If the code never runs on non-GNU implementation of sed, your current code is acceptable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With