I have a plain text file with words, which are separated by comma, for example: <pre class="prettyprint"><code>word1, word2, word3, word2, word4, word5, word 3, word6, word7, word3 </code></pre> i want to delete the duplicates and to become: <pre class="prettyprint"><code>word1, word2, word3, word4, word5, word6, word7 </code></pre> Any Ideas? I think, egrep can help me, but i'm not sure, how to use it exactly....

Assuming that the words are one per line, and the file is already sorted: <pre class="prettyprint"><code>uniq filename </code></pre> If the file's not sorted: <pre class="prettyprint"><code>sort filename | uniq </code></pre> If they're not one per line, and you don't mind them being one per line: <pre class="prettyprint"><code>tr -s [:space:] \\n < filename | sort | uniq </code></pre> That doesn't remove punctuation, though, so maybe you want: <pre class="prettyprint"><code>tr -s [:space:][:punct:] \\n < filename | sort | uniq </code></pre> But that removes the hyphen from hyphenated words. "man tr" for more options.

How to remove duplicate words from a plain text file using linux command

Tags:

I have a plain text file with words, which are separated by comma, for example:

word1, word2, word3, word2, word4, word5, word 3, word6, word7, word3

i want to delete the duplicates and to become:

word1, word2, word3, word4, word5, word6, word7

Any Ideas? I think, egrep can help me, but i'm not sure, how to use it exactly....

780

asked Jun 04 '09 18:06

cupakob

1 Answers

Assuming that the words are one per line, and the file is already sorted:

uniq filename

If the file's not sorted:

sort filename | uniq

If they're not one per line, and you don't mind them being one per line:

tr -s [:space:] \\n < filename | sort | uniq

That doesn't remove punctuation, though, so maybe you want:

tr -s [:space:][:punct:] \\n < filename | sort | uniq

But that removes the hyphen from hyphenated words. "man tr" for more options.

130

answered Oct 06 '22 10:10

Randy Orrison

Related questions
                            
                                Mark data as sensitive in python
                            
                                Redirect mobile devices to alternate version of my site
                            
                                How can I give my Java application a unique process name?
                            
                                Passing expressions to functions?
                            
                                MS Access: how to compact current database in VBA
                            
                                Misusing the term "Code Freeze" [closed]
                            
                                Outputting (puts, print) in Rails Unit Tests
                            
                                How do I divide matrix elements by column sums in MATLAB?
                            
                                Objective C: how to check if variable is NSArray or NSMutableArray
                            
                                Java: Infinite loop using Scanner in.hasNextInt()
                            
                                Download file using partial download (HTTP)
                            
                                Test for existence of template block in a template

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With