I'm trying to sanitize quotes(") in text file by replacing them with \" or an empty string. The replacing must occur in substring delimited by TOKEN placeholders.
Example
# input (example.txt):
Line title "A" - TOKEN some line with "quotation marks"
Line title "B" - TOKEN some line with "another quotation marks"
Line title "C" - TOKEN some "line" TOKEN more "text"
Random "line"
# result (example.txt)
Line title "A" - TOKEN some line with \"quotation marks\"
Line title "B" - TOKEN some line with \"another quotation marks\"
Line title "C" - TOKEN some \"line\" TOKEN more "text"
Random "line"
# Another option
# result (example.txt)
Line title "A" - TOKEN some line with quotation marks
Line title "B" - TOKEN some line with another quotation marks
Line title "C" - TOKEN some line TOKEN more "text"
Random "line"
Preferably without external dependencies(i.e Python,JS) on Linux, so probably sed, awk, bash are best
PS - What I've tried so far is:
sed -iE "s/TOKEN(.+)(\")(.+).*\TOKEN\1\3/g" /tmp/test
But it handles only a single replacement per line
EDIT:
(sorry about late addition after many answers)
Assuming that:
TOKEN treated as a regular expression (or if not will escape metachars in advance of using it),TOKEN doesn't occur should be left unchanged, andTOKEN matches even if it's in the middle of another stringthen using any awk in any shell on every Unix box:
$ awk '
match($0,/TOKEN.*TOKEN/) || match($0,/TOKEN.*/) {
tgt = substr($0,RSTART,RLENGTH)
gsub(/"/, "\\\"", tgt)
$0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
}
1' example.txt
Line title "A" - TOKEN some line with \"quotation marks\"
Line title "B" - TOKEN some line with \"another quotation marks\"
Line title "C" - TOKEN some \"line\" TOKEN more "text"
Random "line"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With