I need to substitute duplications in my first column with just "."
For example:
name1
name1
name1
name2
name2
name3
name3
And I need Output:
name1
.
.
name2
.
name3
.
I have solution like this:
awk '{c=$1} c==p{gsub(/./,".",$1)} {p=c} 1' in.file
But the output is:
name1
.....
.....
name2
.....
name3
.....
Is there any solution without any other piping?
Use an array to check if a line has already been seen!
$ awk 'seen[$0]++ {$0="."}1' file
name1
.
.
name2
.
name3
.
The typical way to skip repeated lines is to say awk '!seen[$0]++' file. Here we use the same logic but twisting it a little bit: we use the array seen[] to check if a line has appeared so far. If it has, seen[$0]++ will be bigger than 0, so {$0="."} will occur. Then, 1 prints either this or the line.
If you happen to need this to check not the full line but a defined column, do replace $0 (full record) with $n, where n is the nth field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With