Remove duplicate words in a line with sed

Question

Purely academic, but it's frustrating me.

I want to correct this text:

there there are are multiple lexical errors in this line line

using sed. I've got this far:

sed 's/$[a-z][a-z]*[ ,\n][ ,\n]*$\1/\1/g' < file.text

It corrects everything except the final doubled up words!

there are multiple lexical errors in this line line

Can a sed guru please explain why the above doesn't deal with the words at the end?

codaddict · Accepted Answer

This is because in the last case (line) your regex memory 1 will have line (line followed by a space) in it and you are searching for its repetition. Since there is not space after the last line the match fails.

To fix this add a space after the ending word line.

Alternatively you can change the regex to:

sed -e 's/\b$[a-z]\+$[ ,\n]\1/\1/g'

See it

Remove duplicate words in a line with sed

Tags:

sed

benjwy

1 Answers

codaddict

Recent Activity

Donate For Us

Remove duplicate words in a line with sed

Tags:

sed

benjwy

1 Answers

codaddict

Related questions

Recent Activity

Donate For Us