in Linux, I have a text file which have duplicate words like this
abc line 1
xyz zzz
123 456
abc end line
Now I want to print only all DUPLICATE words (which is abc) how ?
The uniq command in Linux is used to display identical lines in a text file. This command can be helpful if you want to remove duplicate words or strings from a text file. Since the uniq command matches adjacent lines for finding redundant copies, it only works with sorted text files.
Using awk's asssociative array, every record is stored as index and the value is the count of the number of times the record appears in the file. At the end, only those records are printed whose count is more than 1 which indicates duplicate record.
You can tokenize the words with grep -wo
and find consecutive duplicates with uniq -d
, add -c
to count the number of duplicates, e.g.:
grep -wo '[[:alnum:]]\+' infile | sort | uniq -cd
Output:
2 abc
2 line
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With