Print the duplicate lines in a file using awk

Question

I have a requirement to print all the duplicated lines in a file where in uniq -D option did not support. So I am thinking of an alternative way to print the duplicate lines using awk. I know that, we have an option in awk like below.

testfile.txt

apple
apple
orange
orange
cherry
cherry
kiwi
strawberry
strawberry
papaya
cashew
cashew
pista

The command:

awk 'seen[$0]++' testfile.txt

But the above does print only the unique duplicate lines. I need the same output that uniq -D command retrieves like this.

apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew

Benjamin W. · Accepted Answer

With sed:

$ sed 'N;/^$.*$\n\1$/p;$d;D' testfile.txt
apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew

This does the following:

N                 # Append next line to pattern space
/^$.*$\n\1$/p   # Print if lines in pattern space are identical
$d                # Avoid printing lone non-duplicate last line
D                 # Delete first line in pattern space

There are a few limitations:

It only works for contiguous duplicates, i.e., not for
```
apple
orange
apple
```
Lines appearing more than twice in a row throw it off.

glenn jackman · Answer

If you want to stick with just plain awk, you'll have to process the file twice: once to generate the counts, once to eliminate the lines with count equal 1:

awk 'NR==FNR {count[$0]++; next} count[$0]>1' testfile.txt testfile.txt

Print the duplicate lines in a file using awk

Tags:

sed

awk

user3834663

2 Answers

Benjamin W.

glenn jackman

Recent Activity

Donate For Us

Print the duplicate lines in a file using awk

Tags:

sed

awk

user3834663

2 Answers

Benjamin W.

glenn jackman

Related questions

Recent Activity

Donate For Us