I have a text file like this:
VAREAKAVVLRDRKSTRLN 2888
ACP*VRWPIYTACGP 292
RDRKSTRLNSSHVVTSRMP 114
VAREA*KAVVLRDRRAHV*T 73
in the 1st column in some rows there is a "*". I want to remove all the lines with that '*'. here is the expected output:
expected output:
VAREAKAVVLRDRKSTRLN 2888
RDRKSTRLNSSHVVTSRMP 114
to do so, I am using this code:
awk -F "\t" '{ if(($1 == '*')) { print $1 "," $2} }' infile.txt > outfile.txt
this code does not return the expected output. how can I fix it?
how can I fix it?
You did
awk -F "\t" '{ if(($1 == '*')) { print $1 "," $2} }' infile.txt > outfile.txt
by doing $1 == "*" you are asking: is first field * not does first contain *? You might use index function which does return position of match if found or 0 otherwise. Let infile.txt content be
VAREAKAVVLRDRKSTRLN 2888
ACP*VRWPIYTACGP 292
RDRKSTRLNSSHVVTSRMP 114
VAREA*KAVVLRDRRAHV*T 73
then
awk 'index($1,"*")==0{print $1,$2}' infile.txt
output
VAREAKAVVLRDRKSTRLN 2888
RDRKSTRLNSSHVVTSRMP 114
Note that if you use index rather than pattern /.../ you do not have to care about characters with special meaning, e.g. .. Note that for data you have you do not have to set field separator (FS) explicitly. Important ' is not legal string delimiter in GNU AWK, you should use " for that purpose, unless your intent is to summon hard to find bugs.
(tested in gawk 4.2.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With