Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping question mark character in sed bash script variable

Tags:

regex

bash

sed

I have a set of saved html files with links in them of the form http://mywebsite.com/showfile.cgi?key=somenumber but I want to kill the question mark (side-story is that firefox hates ? and randomly converts it to %3F I'm sure there's some magic fix but that's for another question...)

However, I think my code is causing the question-mark character to not be read/saved/handled properly when storing the options as a variable by bash

# Doesn't work (no pattern matched)
SED_OPTIONS='-i s/\.cgi\?key/\.cgikey/g'

# Works e.g. http://mywebsite.com/showfileblah?key=somenumber
SED_OPTIONS='-i s/\.cgi/blah/g'

# Leaves question mark in e.g. http://mywebsite.com/showfile.blah?key=somenumber
SED_OPTIONS='-i s/cgi\?/blah/g'

# Actual sed command run when using SED_OPTIONS (I define FILES earlier in
# the code)
sed $SED_OPTIONS $FILES

# Not using the SED_OPTIONS variable works
# e.g. http://mywebsite.com/showfile.cgikey=somenumber
sed -i s/\.cgi\?key/\.cgikey/g $FILES

How can I get the full command to work using the SED_OPTIONS variable?

like image 852
user3553107 Avatar asked Jul 06 '14 19:07

user3553107


2 Answers

The safest way to store a list of options and arguments in variables is to use an array:

Also:

  • You're using a basic regular expression (no -r or -E option), so ? is not a special char. and needs no escaping.
  • In the replacement string, which is not a regex, do not escape ..
  • No need for option g, since you're only replacing 1 occurrence per line.
# Create array with individual options/arguments.
SED_ARGS=( '-i' 's/\.cgi?key/.cgikey/' )

# Invoke `sed` with array - note the double-quoting.
sed "${SED_ARGS[@]}" $FILES

Similarly, it would be safer to use an array for the list of input files. $FILES will only work if the individual filenames contain no embedded whitespace or other elements subject to shell expansions.

Generally:

  • Single-quote string literals, such as the sed script here - to prevent the shell from interpreting them.
  • Double-quote variable references, to prevent the shell from performing additional operations on them, such as pathname expansion (globbing) and word splitting (splitting into multiple tokens by whitespace).
like image 90
mklement0 Avatar answered Nov 01 '22 08:11

mklement0


I suggest storing the arguments for sed in an array:

SED_OPTIONS=( '-i' '-e' 's/\.cgi?key/\.cgikey/g' )

sed "${SED_OPTIONS[@]}" $FILES

However, that's only a part of the trouble.

First, note that when you type:

sed -i s/\.cgi\?key/\.cgikey/g $FILES

what sed sees as the script argument is actually:

s/.cgi?key/.cgikey/g

because you didn't use any quotes to preserve the backslashes. (To demonstrate, use printf "%s\n" s/\.cgi\?key/\.cgikey/g, thus avoiding any questions of whether echo is interpreting the backslashes.) One side effect of this is that a URL such as:

http://example.com/nodotcgi?key=value

will be mapped to:

http://example.com/nodo.cgikey=value

Using the single quotes when setting SED_OPTIONS ensures that the backslashes are preserved where required, and not putting a backslash before the ? works. I have both GNU sed and BSD sed on my Mac; I've aliased them as gnu-sed and bsd-sed for clarity. Note that BSD sed requires a suffix for -i and won't accept standard input with -i. So, I've dropped the -i from the commands.

$ URLS=(http://example.com/script.cgi?key=value http://example.com/nodotcgi?key=value)
$ SED_OPTIONS=( '-e' 's/\.cgi?key/\.cgikey/g' )
$ printf "%s\n" "${URLS[@]}" | bsd-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgikey=value
http://example.com/nodotcgi?key=value
$ printf "%s\n" "${URLS[@]}" | gnu-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgikey=value
http://example.com/nodotcgi?key=value
$ SED_OPTIONS=( '-e' 's/\.cgi\?key/\.cgikey/g' )
$ printf "%s\n" "${URLS[@]}" | bsd-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgikey=value
http://example.com/nodotcgi?key=value
$ printf "%s\n" "${URLS[@]}" | gnu-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgi?key=value
http://example.com/nodotcgi?key=value
$

Note the difference in behaviour between the two versions of sed when there's a backslash before the question mark (second part of the example).

like image 43
Jonathan Leffler Avatar answered Nov 01 '22 10:11

Jonathan Leffler