Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove dot(.) from specific columns using gsub and awk

Tags:

unix

awk

gsub

I want to remove dot(.) only from the 4th and 5th columns of the table.

input
1    10057   .       A       AC      
1    10146   .       AC.      A       
1    10177   .       A       AC      
1    10230   .       AC      .A,AN    
1    10349   .       CCCTA   C,CCCTAA.              
1    10389   .       .AC      A,AN



desired output
1    10057   .       A       AC      
1    10146   .       AC      A       
1    10177   .       A       AC      
1    10230   .       AC      A,AN    
1    10349   .       CCCTA   C,CCCTAA              
1    10389   .       AC      A,AN    

So I tried the following command.

awk 'BEGIN {OFS=FS="\t"} {gsub("\.","",$4);gsub("\.","",$5)}1' input

and I got this result (The whole 4th and 5th columns were removed).

1    10057   .          
1    10146   .            
1    10177   .        
1    10230   .       
1    10349   .                 
1    10389   .       

Can you please point out where I have to modify? Thanks in advance.

like image 585
jamie Avatar asked Sep 26 '13 19:09

jamie


1 Answers

When you use a string to hold an RE (e.g. "\.") the string is parsed twice - once when the script is read by awk and then again when executed by awk. The result is you need to escape RE metacharacters twice (e.g. "\\.").

The better solution in every way is not to specify the RE as a string but specify it as an RE constant instead using appropriate delimiters, e.g. /\./:

awk 'BEGIN {OFS=FS="\t"} {gsub(/\./,"",$4);gsub(/\./,"",$5)}1' input
like image 154
Ed Morton Avatar answered Sep 19 '22 14:09

Ed Morton