Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove a string after a specific character ONLY in a column/field in awk or bash?

Tags:

sed

awk

I have a file with tab-delimited fields (or columns) like this one below:

cat abc_table.txt
a   b   c
1   11;qqw  213
2   22  222
3   333;rs2 83838

I would like to remove everything after the ";" on only the second field.

I have tried with

awk 'BEGIN{FS=OFS="\t"} NR>=1 && sub (/;[*]/,"",$2){print $0}' abc_table.txt

but it does not seem to work. I also tried with sed:

 's/;.*//g' abc_table.txt

but it erases also the strings in the third field:

a   b   c
1   11
2   22  222
3   333

The desired output is:

a   b   c
1   11  213
2   22  222
3   333 83838

If someone could help me, I would be very grateful!

like image 825
Giuseppe D'alterio Avatar asked Nov 13 '20 09:11

Giuseppe D'alterio


1 Answers

You need to simply correct your regex.

awk '{sub(/;.*/,"",$2)} 1' Input_file

In case you have Input_file TAB delimited then try:

awk 'BEGIN{FS=OFS="\t"} {sub(/;.*/,"",$2)} 1' Input_file

Problem in OP's regex: OP's regex ;[*] is looking for ; and *(literal character) in 2nd field that's why its NOT able to substitute everything after ; in 2nd field. We need to simply give ;.* which means grab everything from very first occurrence of ; till last of 2nd field and then substitute with NULL in 2nd field.

like image 55
RavinderSingh13 Avatar answered Sep 26 '22 19:09

RavinderSingh13