I want to remove dot(.) only from the 4th and 5th columns of the table.
input
1 10057 . A AC
1 10146 . AC. A
1 10177 . A AC
1 10230 . AC .A,AN
1 10349 . CCCTA C,CCCTAA.
1 10389 . .AC A,AN
desired output
1 10057 . A AC
1 10146 . AC A
1 10177 . A AC
1 10230 . AC A,AN
1 10349 . CCCTA C,CCCTAA
1 10389 . AC A,AN
So I tried the following command.
awk 'BEGIN {OFS=FS="\t"} {gsub("\.","",$4);gsub("\.","",$5)}1' input
and I got this result (The whole 4th and 5th columns were removed).
1 10057 .
1 10146 .
1 10177 .
1 10230 .
1 10349 .
1 10389 .
Can you please point out where I have to modify? Thanks in advance.
When you use a string to hold an RE (e.g. "\."
) the string is parsed twice - once when the script is read by awk and then again when executed by awk. The result is you need to escape RE metacharacters twice (e.g. "\\."
).
The better solution in every way is not to specify the RE as a string but specify it as an RE constant instead using appropriate delimiters, e.g. /\./
:
awk 'BEGIN {OFS=FS="\t"} {gsub(/\./,"",$4);gsub(/\./,"",$5)}1' input
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With