Purely academic, but it's frustrating me.
I want to correct this text:
there there are are multiple lexical errors in this line line
using sed. I've got this far:
sed 's/\([a-z][a-z]*[ ,\n][ ,\n]*\)\1/\1/g' < file.text
It corrects everything except the final doubled up words!
there are multiple lexical errors in this line line
Can a sed guru please explain why the above doesn't deal with the words at the end?
This is because in the last case (line
) your regex memory 1 will have line
(line followed by a space) in it and you are searching for its repetition. Since there is not space after the last line
the match fails.
To fix this add a space after the ending word line
.
Alternatively you can change the regex to:
sed -e 's/\b\([a-z]\+\)[ ,\n]\1/\1/g'
See it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With