I have a two sentences containing duplicate words, for example, the input data in file my_text.txt
:
The Unix and Linux operating system.
The Unix and Linux system was to create an environment that promoted efficient program.
I used this script:
while read p
do
echo "$p"|sort -u | uniq
done < my_text.txt
But the output is the same content of the input file:
The Unix and Linux operating system. The Unix and Linux system was to create an environment that promoted efficient program
How can I remove the duplicate words from both sentences?
Your code would remove repeated lines; both sort
and uniq
operate on lines, not words. (And even then, the loop is superfluous; if you wanted to do that, your code should be simplified to just sort -u my_text.txt
.)
The usual fix is to split the input to one word per line; there are some complications with real-world text, but the first basic Unix 101 implementation looks like
tr ' ' '\n' <my_text.txt | sort -u
Of course, this gives you the words in a different order than in the original, and saves the first occurrence of every word. If you wanted to discard any words which occur more than once, maybe try
tr ' ' '\n' <my_text.txt | sort | uniq -c | awk '$1 == 1 { print $2 }'
(If your tr
doesn't recognize \n
as newline, maybe try '\012'
.)
Here is a dead simple two-pass Awk script which hopefully is a little bit more useful. It collects all the words into memory during the first pass over the file, then on the second, removes any words which occurred more than once.
awk 'NR==FNR { for (i=1; i<=NF; ++i) ++a[$i]; next }
{ for (i=1; i<=NF; ++i) if (a[$i] > 1) $i="" } 1' my_test.txt my_test.txt
This leaves whitespace where words were removed; fixing that should be easy enough with a final sub()
.
A somewhat more useful program would split off any punctuation, and reduce words to lowercase (so that Word
, word
, Word!
, and word?
don't count as separate).
Can use this command to remove duplication of words from both sentences :
tr ' ' '\n' <my_text.txt | sort | uniq | xargs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With