Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do I need to quote in sed command lines?

Tags:

regex

bash

sh

sed

There are many questions on this site on how to escape various elements for sed, but I'm looking for a more general answer. I understand that I might want to escape some characters to avoid shell expansion:

Bash:

  • Single quoted [strings] ('') are used to preserve the literal value of each character enclosed within the quotes. [However,] a single quote may not occur between single quotes, even when preceded by a backslash.
  • The backslash retains its meaning [in double quoted strings] only when followed by dollar, backtick, double quote, backslash or newline. Within double quotes, the backslashes are removed from the input stream when followed by one of these characters. Backslashes preceding characters that don't have a special meaning are left unmodified for processing by the shell interpreter.

sh: (I hope you don't have history expansion)

  • Single quoted string behaviour: same as bash
  • Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of dollar, single quote, backslash, and, when history expansion is enabled, exclamation mark.
    • The characters dollar and single quote retain their special meaning within double quotes.
    • The backslash retains its special meaning only when followed by one of the following characters: $, ', ", \, or newline. A double quote may be quoted within double quotes by preceding it with a backslash.
    • If enabled, history expansion will be performed unless an exclamation mark appearing in double quotes is escaped using a backslash. The backslash preceding the ! is not removed.

...but none of that explains why this stops working as soon as you remove any escaping:

sed -e "s#\(\w\+\) #\1\/#g" #find a sequence of characters in a line
#    why? ↑   ↑ ↑     ↑     #replace the following space with a slash.

None of (, ), / or + (or [, or ]...) seem to have any special meaning that requires them to be escaped in order to work. Hell, even calling the command directly through Python makes sed not work properly, although the manpage doesn't seem to spell out anything about this (not when I search for backslash, anyway.)

$ lvdisplay -C --noheadings -o vg_name,name > test
$ python
>>> import os
>>> #Python requires backslash escaping of \1, even in triple quotes
>>> #lest \1 is read to mean "byte with value 0x01".
>>> output = os.execl("/bin/sed", "-e", "s#(\w+) #\\1/#g", "test")
(Output remains unchanged)
$ python
>>> import os
>>> output = os.execl("/bin/sed", "-e", "s#\(\w\+\) #\\1\/#g", "test")
(Correct output)
$ WHAT THE HELL
Have you tried using jQuery? It's perfect and it does all the things.
like image 520
badp Avatar asked Oct 21 '25 06:10

badp


2 Answers

If I understood you right, your problem is not about bash/sh, it is about the regex flavour sed uses by default: BRE.

The other [= anything but dot, star, caret and dollar] BRE metacharacters require a backslash to give them their special meaning. The reason is that the oldest versions of UNIX grep did not support these.

Grouping (..) should be escaped to give it special meaning. same as + otherwise sed will try to match them as they are literal strings/chars. That's why your s#\(\w\+\) #...# should be escaped. The replacement part doesn't need escaping, so:

sed 's#\(\w\+\) #\1 /#' 

should work.

sed has usually option to use extended regular expressions (now with ?, +, |, (), {m,n}); e.g. GNU sed has -r, then your one-liner could be:

sed -r 's#(\w+) #\1 /#'

I paste some examples here that may help you understand what's going on:

kent$  echo "abcd "|sed 's#\(\w\+\) #\1 /#'
abcd /
kent$  echo "abcd "|sed -r 's#(\w+) #\1 /#'                                                                                                                                 
abcd /
kent$  echo "(abcd+) "|sed 's#(\w*+) #&/#'
(abcd+) /
like image 64
Kent Avatar answered Oct 23 '25 21:10

Kent


What you're observing is correct. Certain characters like ?, +, (, ), {, } need to be escaped when using basic regular expressions.

Quoting from the sed manual:

The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’). While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.

(Emphasis mine.) These don't need to be escaped, though, when using extended regexps, except when matching a literal character (as mentioned in the last line quoted above.)

like image 24
devnull Avatar answered Oct 23 '25 22:10

devnull



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!