I have a text file containing number of lines formatted like below
001_A.wav;112.680;115.211;;;Ja. Hello; Hi:
my goal is to clean whatever is after ;;;. Meaning to delete the following characters ,;()~?
I know i can do something like sed 's/[,.;()~?,]//g'. However if I do that, it would give me something like
001_Awav112.680115211Ja Hello Hi
However I would like to delete those character only after ;;; so I would get
001_A.wav;112.680;115.211;;;Ja Hello Hi
How can I accomplish this task?
1st solution: Could you please try following, written and tested with shown samples in GNU awk(where assuming ;;; occurring one time in lines).
awk '
match($0,/.*;;;/){
laterPart=substr($0,RSTART+RLENGTH)
gsub(/[,.:;()~?]/,"",laterPart)
print substr($0,RSTART,RLENGTH) laterPart
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/.*;;;/){ ##Using atch function to match everything till ;;; here.
laterPart=substr($0,RSTART+RLENGTH) ##Creating variable laterPart which has rest of the line apart from matched regex part above.
gsub(/[,.:;()~?]/,"",laterPart) ##Globally substituting ,.:;()~? with NULL in laterPart variable.
print substr($0,RSTART,RLENGTH) laterPart ##Printing sub string of matched regex and laterPart var here.
}' Input_file ##Mentioning Input_file name here.
2nd solution: In case you have multiple occurrences of ;;; in lines and you want to substitute characters from all fields, after 1st occurrence of ;;; then try following.
awk 'BEGIN{FS=OFS=";;;"} {for(i=2;i<=NF;i++){gsub(/[,.:;()~?,]/,"",$i)}} 1' Input_file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With