Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search & replace arbitrary literal strings in sed and awk (and perl)

Tags:

bash

sed

awk

perl

xxd

Say we have some arbitrary literals in a file that we need to replace with some other literal.

Normally, we'd just reach for sed(1) or awk(1) and code something like:

sed "s/$target/$replacement/g" file.txt

But what if the $target and/or $replacement could contain characters that are sensitive to sed(1) such as regular expressions. You could escape them but suppose you don't know what they are - they are arbitrary, ok? You'd need to code up something to escape all possible sensitive characters - including the '/' separator. eg

t=$( echo "$target" | sed 's/\./\\./g; s/\*/\\*/g; s/\[/\\[/g; ...' ) # arghhh!

That's pretty awkward for such a simple problem.

perl(1) has \Q ... \E quotes but even that can't cope with the '/' separator in $target.

perl -pe "s/\Q$target\E/$replacement/g" file.txt

I just posted an answer!! So my real question is, "is there a better way to do literal replacements in sed/awk/perl?"

If not, I'll leave this here in case it comes in useful.

like image 722
wef Avatar asked Jan 06 '19 07:01

wef


2 Answers

The quotemeta, which implements \Q, absolutely does what you ask for

all ASCII characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash

Since this is presumably in a shell script, the problem is really of how and when shell variables get interpolated and so what the Perl program ends up seeing.

The best way is to avoid working out that interpolation mess and instead properly pass those shell variables to the Perl one-liner. This can be done in several ways; see this post for details.

Either pass the shell variables simply as arguments

#!/bin/bash

# define $target

perl -pe"BEGIN { $patt = shift }; s{\Q$patt}{$replacement}g" "$target" file.txt

where the needed arguments are removed from @ARGV and utilized in a BEGIN block, so before the runtime; then file.txt gets processed. There is no need for \E in the regex here.

Or, use the -s switch, which enables command-line switches for the program

# define $target, etc

perl -s -pe"s{\Q$patt}{$replacement}g" -- -patt="$target" file.txt

The -- is needed to mark the start of arguments, and switches must come before filenames.

Finally, you can also export the shell variables, which can then be used in the Perl script via %ENV; but in general I'd rather recommend either of the above two approaches.


A full example

#!/bin/bash
# Last modified: 2019 Jan 06 (22:15)

target="/{"
replacement="&"

echo "Replace $target with $replacement"

perl -wE'
    BEGIN { $p = shift; $r = shift }; 
    $_=q(ah/{yes); s/\Q$p/$r/; say
' "$target" "$replacement"

This prints

Replace /{ with &
ah&yes

where I've used characters mentioned in a comment.

The other way

#!/bin/bash
# Last modified: 2019 Jan 06 (22:05)

target="/{"
replacement="&"

echo "Replace $target with $replacement"

perl -s -wE'$_ = q(ah/{yes); s/\Q$patt/$repl/; say' \
    -- -patt="$target" -repl="$replacement"

where code is broken over lines for readability here (and thus needs the \). Same printout.

like image 156
zdim Avatar answered Sep 19 '22 05:09

zdim


Me again!

Here's a simpler way using xxd(1):

t=$( echo -n "$target" | xxd -p | tr -d '\n')
r=$( echo -n "$replacement" | xxd -p | tr -d '\n')
xxd -p file.txt | sed "s/$t/$r/g" | xxd -p -r

... so we're hex-encoding the original text with xxd(1) and doing search-replacement using hex-encoded search strings. Finally we hex-decode the result.

EDIT: I forgot to remove \n from the xxd output (| tr -d '\n') so that patterns can span the 60-column output of xxd. Of course, this relies on GNU sed's ability to operate on very long lines (limited only by memory).

EDIT: this also works on multi-line targets eg

target=$'foo\nbar' replacement=$'bar\nfoo'

like image 21
wef Avatar answered Sep 20 '22 05:09

wef