I have a plain text file with words, which are separated by comma, for example:
word1, word2, word3, word2, word4, word5, word 3, word6, word7, word3
i want to delete the duplicates and to become:
word1, word2, word3, word4, word5, word6, word7
Any Ideas? I think, egrep can help me, but i'm not sure, how to use it exactly....
The uniq command is used to remove duplicate lines from a text file in Linux. By default, this command discards all but the first of adjacent repeated lines, so that no output lines are repeated. Optionally, it can instead only print duplicate lines.
Uniq command is helpful to remove or detect duplicate entries in a file.
The uniq command in Linux is used to display identical lines in a text file. This command can be helpful if you want to remove duplicate words or strings from a text file. Since the uniq command matches adjacent lines for finding redundant copies, it only works with sorted text files.
Assuming that the words are one per line, and the file is already sorted:
uniq filename
If the file's not sorted:
sort filename | uniq
If they're not one per line, and you don't mind them being one per line:
tr -s [:space:] \\n < filename | sort | uniq
That doesn't remove punctuation, though, so maybe you want:
tr -s [:space:][:punct:] \\n < filename | sort | uniq
But that removes the hyphen from hyphenated words. "man tr" for more options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With