Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print the duplicate lines in a file using awk

Tags:

sed

awk

I have a requirement to print all the duplicated lines in a file where in uniq -D option did not support. So I am thinking of an alternative way to print the duplicate lines using awk. I know that, we have an option in awk like below.

testfile.txt

apple
apple
orange
orange
cherry
cherry
kiwi
strawberry
strawberry
papaya
cashew
cashew
pista

The command:

awk 'seen[$0]++' testfile.txt

But the above does print only the unique duplicate lines. I need the same output that uniq -D command retrieves like this.

apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew
like image 366
user3834663 Avatar asked Apr 08 '16 17:04

user3834663


2 Answers

With sed:

$ sed 'N;/^\(.*\)\n\1$/p;$d;D' testfile.txt
apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew

This does the following:

N                 # Append next line to pattern space
/^\(.*\)\n\1$/p   # Print if lines in pattern space are identical
$d                # Avoid printing lone non-duplicate last line
D                 # Delete first line in pattern space

There are a few limitations:

  • It only works for contiguous duplicates, i.e., not for

    apple
    orange
    apple
    
  • Lines appearing more than twice in a row throw it off.

like image 197
Benjamin W. Avatar answered Dec 11 '22 02:12

Benjamin W.


If you want to stick with just plain awk, you'll have to process the file twice: once to generate the counts, once to eliminate the lines with count equal 1:

awk 'NR==FNR {count[$0]++; next} count[$0]>1' testfile.txt testfile.txt
like image 35
glenn jackman Avatar answered Dec 11 '22 01:12

glenn jackman