I have data that looks like this (TAB delimited):
Organ K ClustNo Analysis
LN K200 C12 Gene Ontology
LN K200 C116 Gene Ontology
CN K200 C2 Gene Ontology
What I want to do is to remove C
for every row on 3rd column, except header row:
Organ K ClustNo Analysis
LN K200 12 Gene Ontology
LN K200 116 Gene Ontology
CN K200 2 Gene Ontology
This won't do because it will affect other columns and header row:
sed 's/C//'
What's the right way to do it?
awk
is a good tool for this:
$ awk -F'\t' -v OFS='\t' 'NR>=2{sub(/^C/, "", $3)} 1' file
Organ K ClustNo Analysis
LN K200 12 Gene Ontology
LN K200 116 Gene Ontology
CN K200 2 Gene Ontology
-F'\t'
Use tab as the field delimiter on input.
-v OFS='\t'
Use tab as the field delimiter on output
NR>=2 {sub(/^C/, "", $3)}
Remove the initial C
from field 3 only for lines after the first line.
1
This is awk's cryptic shorthand for print-the-line.
$ sed -r '2,$ s/(([^\t]+\t+){2})C/\1/' file
Organ K ClustNo Analysis
LN K200 12 Gene Ontology
LN K200 116 Gene Ontology
CN K200 2 Gene Ontology
-r
Use extended regular expressions. (On Mac OSX or other BSD platform, use -E
instead.)
2,$ s/(([^\t]+\t){2})C/\1/
This substitution is applied only for lines from 2 to the end of the file.
(([^\t]+\t){2})
matches the first two tab-separated columns. This assumes that only one tab separates each column. Because the regex is enclosed in parens, what it matches will be available later as \1
.
C
this match C
.
\1
replaces the matched text with just the first two columns, not the C
..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With