I have a file with tab-delimited fields (or columns) like this one below:
cat abc_table.txt
a b c
1 11;qqw 213
2 22 222
3 333;rs2 83838
I would like to remove everything after the ";" on only the second field.
I have tried with
awk 'BEGIN{FS=OFS="\t"} NR>=1 && sub (/;[*]/,"",$2){print $0}' abc_table.txt
but it does not seem to work. I also tried with sed:
's/;.*//g' abc_table.txt
but it erases also the strings in the third field:
a b c
1 11
2 22 222
3 333
The desired output is:
a b c
1 11 213
2 22 222
3 333 83838
If someone could help me, I would be very grateful!
You need to simply correct your regex.
awk '{sub(/;.*/,"",$2)} 1' Input_file
In case you have Input_file TAB delimited then try:
awk 'BEGIN{FS=OFS="\t"} {sub(/;.*/,"",$2)} 1' Input_file
Problem in OP's regex: OP's regex ;[*]
is looking for ;
and *
(literal character) in 2nd field that's why its NOT able to substitute everything after ;
in 2nd field. We need to simply give ;.*
which means grab everything from very first occurrence of ;
till last of 2nd field and then substitute with NULL in 2nd field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With