I have a requirement to print all the duplicated lines in a file where in uniq -D
option did not support. So I am thinking of an alternative way to print the duplicate lines using awk. I know that, we have an option in awk like below.
testfile.txt
apple
apple
orange
orange
cherry
cherry
kiwi
strawberry
strawberry
papaya
cashew
cashew
pista
The command:
awk 'seen[$0]++' testfile.txt
But the above does print only the unique duplicate lines. I need the same output that uniq -D
command retrieves like this.
apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew
With sed:
$ sed 'N;/^\(.*\)\n\1$/p;$d;D' testfile.txt
apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew
This does the following:
N # Append next line to pattern space
/^\(.*\)\n\1$/p # Print if lines in pattern space are identical
$d # Avoid printing lone non-duplicate last line
D # Delete first line in pattern space
There are a few limitations:
It only works for contiguous duplicates, i.e., not for
apple
orange
apple
Lines appearing more than twice in a row throw it off.
If you want to stick with just plain awk, you'll have to process the file twice: once to generate the counts, once to eliminate the lines with count equal 1:
awk 'NR==FNR {count[$0]++; next} count[$0]>1' testfile.txt testfile.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With