I'm wondering whether it is possible to write a 100% reliable sed
command to escape any regex metacharacters in an input string so that it can be used in a subsequent sed command. Like this:
#!/bin/bash # Trying to replace one regex by another in an input file with sed search="/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3" replace="/xyz\n\t[0-9]\+\([^ ]\)\{2,3\}\3" # Sanitize input search=$(sed 'script to escape' <<< "$search") replace=$(sed 'script to escape' <<< "$replace") # Use it in a sed command sed "s/$search/$replace/" input
I know that there are better tools to work with fixed strings instead of patterns, for example awk
, perl
or python
. I would just like to prove whether it is possible or not with sed
. I would say let's concentrate on basic POSIX regexes to have even more fun! :)
I have tried a lot of things but anytime I could find an input which broke my attempt. I thought keeping it abstract as script to escape
would not lead anybody into the wrong direction.
Btw, the discussion came up here. I thought this could be a good place to collect solutions and probably break and/or elaborate them.
To match any of the metacharacters literally, one needs to escape these characters using a backslash ( \ ) to suppress their special meaning.
Regular expressions are used by several different Unix commands, including ed, sed, awk, grep, and to a more limited extent, vi.
Put a backslash before $. */[\]^ and only those characters (but not inside bracket expressions).
Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .
Note:
If you're looking for prepackaged functionality based on the techniques discussed in this answer:
bash
functions that enable robust escaping even in multi-line substitutions can be found at the bottom of this post (plus a perl
solution that uses perl
's built-in support for such escaping).bash
script) that robustly performs single-line substitutions. sed
command used below, which is needed if you want to escape string literals for potential use with other regex-processing tools, such as awk
and perl
. In short: for cross-tool use, \
must be escaped as \\
rather than as [\]
, which means: instead of thesed 's/[^^]/[&]/g; s/\^/\\^/g'
command used below, you must usesed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g'
All snippets assume bash
as the shell (POSIX-compliant reformulations are possible):
sed
:To give credit where credit is due: I found the regex used below in this answer.
Assuming that the search string is a single-line string:
search='abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3' # sample input containing metachars. searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it. sed -n "s/$searchEscaped/foo/p" <<<"$search" # if ok, echoes 'foo'
^
is placed in its own character set [...]
expression to treat it as a literal. ^
is the one char. you cannot represent as [^]
, because it has special meaning in that location (negation).^
chars. are escaped as \^
. \
in front of it because that can turn a literal char into a metachar, e.g. \<
and \b
are word boundaries in some tools, \n
is a newline, \{
is the start of a RE interval like \{1,3\}
, etc.The approach is robust, but not efficient.
The robustness comes from not trying to anticipate all special regex characters - which will vary across regex dialects - but to focus on only 2 features shared by all regex dialects:
^
as \^
sed
's s///
command:The replacement string in a sed
s///
command is not a regex, but it recognizes placeholders that refer to either the entire string matched by the regex (&
) or specific capture-group results by index (\1
, \2
, ...), so these must be escaped, along with the (customary) regex delimiter, /
.
Assuming that the replacement string is a single-line string:
replace='Laurel & Hardy; PS\2' # sample input containing metachars. replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape it sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar" # if ok, outputs $replace as is
sed
:Note: This only makes sense if multiple input lines (possibly ALL) have been read before attempting to match.
Since tools such as sed
and awk
operate on a single line at a time by default, extra steps are needed to make them read more than one line at a time.
# Define sample multi-line literal. search='/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3 /def\n\t[A-Z]\+\([^ ]\)\{3,4\}\4' # Escape it. searchEscaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n') #' # Use in a Sed command that reads ALL input lines up front. # If ok, echoes 'foo' sed -n -e ':a' -e '$!{N;ba' -e '}' -e "s/$searchEscaped/foo/p" <<<"$search"
'\n'
strings, which is how newlines are encoded in a regex.$!a\'$'\n''\\n'
appends string '\n'
to every output line but the last (the last newline is ignored, because it was added by <<<
)tr -d '\n
then removes all actual newlines from the string (sed
adds one whenever it prints its pattern space), effectively replacing all newlines in the input with '\n'
strings.-e ':a' -e '$!{N;ba' -e '}'
is the POSIX-compliant form of a sed
idiom that reads all input lines a loop, therefore leaving subsequent commands to operate on all input lines at once.
sed
(only), you can use its -z
option to simplify reading all input lines at once:sed -z "s/$searchEscaped/foo/" <<<"$search"
sed
's s///
command:# Define sample multi-line literal. replace='Laurel & Hardy; PS\2 Masters\1 & Johnson\2' # Escape it for use as a Sed replacement string. IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace") replaceEscaped=${REPLY%$'\n'} # If ok, outputs $replace as is. sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar"
\
-escaped.-e ':a' -e '$!{N;ba' -e '}'
is the POSIX-compliant form of a sed
idiom that reads all input lines a loop.'s/[&/\]/\\&/g
escapes all &
, \
and /
instances, as in the single-line solution.s/\n/\\&/g'
then \
-prefixes all actual newlines.IFS= read -d '' -r
is used to read the sed
command's output as is (to avoid the automatic removal of trailing newlines that a command substitution ($(...)
) would perform).${REPLY%$'\n'}
then removes a single trailing newline, which the <<<
has implicitly appended to the input.bash
functions based on the above (for sed
):quoteRe()
quotes (escapes) for use in a regex quoteSubst()
quotes for use in the substitution string of a s///
call.sed
reads a single line at at time by default, use of quoteRe()
with multi-line strings only makes sense in sed
commands that explicitly read multiple (or all) lines at once.$(...)
) to call the functions won't work for strings that have trailing newlines; in that event, use something like IFS= read -d '' -r escapedValue <(quoteSubst "$value")
# SYNOPSIS # quoteRe <text> quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# SYNOPSIS # quoteSubst <text> quoteSubst() { IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1") printf %s "${REPLY%$'\n'}" }
Example:
from=$'Cost\(*):\n$3.' # sample input containing metachars. to='You & I'$'\n''eating A\1 sauce.' # sample replacement string with metachars. # Should print the unmodified value of $to sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$from")/$(quoteSubst "$to")/" <<<"$from"
Note the use of -e ':a' -e '$!{N;ba' -e '}'
to read all input at once, so that the multi-line substitution works.
perl
solution:Perl has built-in support for escaping arbitrary strings for literal use in a regex: the quotemeta()
function or its equivalent \Q...\E
quoting.
The approach is the same for both single- and multi-line strings; for example:
from=$'Cost\(*):\n$3.' # sample input containing metachars. to='You owe me $1/$& for'$'\n''eating A\1 sauce.' # sample replacement string w/ metachars. # Should print the unmodified value of $to. # Note that the replacement value needs NO escaping. perl -s -0777 -pe 's/\Q$from\E/$to/' -- -from="$from" -to="$to" <<<"$from"
Note the use of -0777
to read all input at once, so that the multi-line substitution works.
The -s
option allows placing -<var>=<val>
-style Perl variable definitions following --
after the script, before any filename operands.
Building upon @mklement0's answer in this thread, the following tool will replace any single-line string (as opposed to regexp) with any other single-line string using sed
and bash
:
$ cat sedstr #!/bin/bash old="$1" new="$2" file="${3:--}" escOld=$(sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g' <<< "$old") escNew=$(sed 's/[&/\]/\\&/g' <<< "$new") sed "s/$escOld/$escNew/g" "$file"
To illustrate the need for this tool, consider trying to replace a.*/b{2,}\nc
with d&e\1f
by calling sed
directly:
$ cat file a.*/b{2,}\nc axx/bb\nc $ sed 's/a.*/b{2,}\nc/d&e\1f/' file sed: -e expression #1, char 16: unknown option to `s' $ sed 's/a.*\/b{2,}\nc/d&e\1f/' file sed: -e expression #1, char 23: invalid reference \1 on `s' command's RHS $ sed 's/a.*\/b{2,}\nc/d&e\\1f/' file a.*/b{2,}\nc axx/bb\nc # .... and so on, peeling the onion ad nauseum until: $ sed 's/a\.\*\/b{2,}\\nc/d\&e\\1f/' file d&e\1f axx/bb\nc
or use the above tool:
$ sedstr 'a.*/b{2,}\nc' 'd&e\1f' file d&e\1f axx/bb\nc
The reason this is useful is that it can be easily augmented to use word-delimiters to replace words if necessary, e.g. in GNU sed
syntax:
sed "s/\<$escOld\>/$escNew/g" "$file"
whereas the tools that actually operate on strings (e.g. awk
's index()
) cannot use word-delimiters.
NOTE: the reason to not wrap \
in a bracket expression is that if you were using a tool that accepts [\]]
as a literal ]
inside a bracket expression (e.g. perl and most awk implementations) to do the actual final substitution (i.e. instead of sed "s/$escOld/$escNew/g"
) then you couldn't use the approach of:
sed 's/[^^]/[&]/g; s/\^/\\^/g'
to escape \
by enclosing it in []
because then \x
would become [\][x]
which means \ or ] or [ or x
. Instead you'd need:
sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g'
So while [\]
is probably OK for all current sed implementations, we know that \\
will work for all sed, awk, perl, etc. implementations and so use that form of escaping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With