Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find Duplicate/Repeated or Unique words in file spanning across multiple lines

in Linux, I have a text file which have duplicate words like this

abc line 1
xyz zzz
123 456
abc end line

Now I want to print only all DUPLICATE words (which is abc) how ?

like image 857
Syed Jahanzaib Avatar asked Feb 26 '14 07:02

Syed Jahanzaib


People also ask

How do you find repeated words in Linux?

The uniq command in Linux is used to display identical lines in a text file. This command can be helpful if you want to remove duplicate words or strings from a text file. Since the uniq command matches adjacent lines for finding redundant copies, it only works with sorted text files.

How do I find duplicates in awk?

Using awk's asssociative array, every record is stored as index and the value is the count of the number of times the record appears in the file. At the end, only those records are printed whose count is more than 1 which indicates duplicate record.


1 Answers

You can tokenize the words with grep -wo and find consecutive duplicates with uniq -d, add -c to count the number of duplicates, e.g.:

grep -wo '[[:alnum:]]\+' infile | sort | uniq -cd

Output:

2 abc
2 line
like image 117
Thor Avatar answered Oct 07 '22 06:10

Thor