Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace duplicated rows with "." in awk?

I need to substitute duplications in my first column with just "."

For example:

name1
name1
name1
name2
name2
name3
name3

And I need Output:

name1
.
.
name2
.
name3
.

I have solution like this:

awk '{c=$1} c==p{gsub(/./,".",$1)} {p=c} 1' in.file

But the output is:

name1
.....
.....
name2
.....
name3
.....

Is there any solution without any other piping?

like image 983
Geroge Avatar asked Dec 03 '25 21:12

Geroge


1 Answers

Use an array to check if a line has already been seen!

$ awk 'seen[$0]++ {$0="."}1' file
name1
.
.
name2
.
name3
.

The typical way to skip repeated lines is to say awk '!seen[$0]++' file. Here we use the same logic but twisting it a little bit: we use the array seen[] to check if a line has appeared so far. If it has, seen[$0]++ will be bigger than 0, so {$0="."} will occur. Then, 1 prints either this or the line.

If you happen to need this to check not the full line but a defined column, do replace $0 (full record) with $n, where n is the nth field.

like image 139
fedorqui 'SO stop harming' Avatar answered Dec 06 '25 13:12

fedorqui 'SO stop harming'



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!