Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sed is not replacing all instances in a file when areas overlap

I need to replace several words with other words.

For e.g.: "apple" with "FRUIT" in file, only in these 4 situations:

  • _apple_, has a blank space before and after.
  • [apple_, has a square opening bracket before and a blank space after.
  • _apple], has a blank space before and a square closing bracket after.
  • [apple], has square brackets before and after.

I do not want the replaces to occur in any other situation.

I have tried using the following code:

a="apple"
b="fruit"
sed -i "s/ $a / $b /g" ./file
sed -i "s/\[$a /\[$b /g" ./file
sed -i "s/ $a\]/ $b\]/g" ./file
sed -i "s/\[$a\]/\[$b\]/g" ./file

I thought the option "g" at the end would mean it would replace all instances, but I found this is not a thorough solution. For e.g. if file contains this:

apple spider apple apple spider tree apple tree

The third occurrence of "apple" is not being replaced. Also in this, several appearances of the word are not changed:

apple  spider apple apple apple apple apple spider tree apple tree

I suspect this is because the shared "space".

How can I get this to find and replace all instances of $a with $b, regardless of any overlap?

like image 902
Village Avatar asked Dec 27 '22 08:12

Village


2 Answers

You can do this using backreferences. This should be fully POSIX compatible

sed -i 's/^badger\([] ]\)/SNAKE\1/g; \
        s/\([[ ]\)badger$/\1SNAKE/g; \
        s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g; \
        s/ badger]/ SNAKE]/g' ./infile

Example

$ sed 's/^badger\([] ]\)/SNAKE\1/g;s/\([[ ]\)badger$/\1SNAKE/g;s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g;s/ badger]/ SNAKE]/g' <<<"badger [badger badger] [badger] badger foobadger badgering mushroom badger"
SNAKE [SNAKE SNAKE] [SNAKE] SNAKE foobadger badgering mushroom SNAKE
like image 21
SiegeX Avatar answered Jan 14 '23 06:01

SiegeX


The quick-and-dirty solution is to perform the replacement twice.

$ echo apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g; s/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT apple[FRUIT FRUIT]

This is safe because, after the first command, the resulting text won't contain any occurrences of (\[| )apple( |\]) that were not already in the original text.

The drawback is that two replacements take roughly twice more time to run.

If you break it in two executions of sed, you can see the steps clearer:

$ echo apple apple apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT apple FRUIT apple apple[FRUIT apple]

$ echo apple FRUIT apple FRUIT apple apple[FRUIT apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT FRUIT FRUIT apple[FRUIT FRUIT]
like image 154
igorrs Avatar answered Jan 14 '23 07:01

igorrs