Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex replace on specific column with SED/AWK

Tags:

linux

unix

sed

awk

I have data that looks like this (TAB delimited):

Organ K     ClustNo Analysis
LN    K200  C12     Gene Ontology
LN    K200  C116    Gene Ontology
CN    K200  C2      Gene Ontology

What I want to do is to remove C for every row on 3rd column, except header row:

Organ K     ClustNo Analysis
LN    K200  12      Gene Ontology
LN    K200  116     Gene Ontology
CN    K200  2       Gene Ontology

This won't do because it will affect other columns and header row:

sed 's/C//'

What's the right way to do it?

like image 688
neversaint Avatar asked Mar 17 '15 04:03

neversaint


1 Answers

Using awk

awk is a good tool for this:

$ awk -F'\t' -v OFS='\t' 'NR>=2{sub(/^C/, "", $3)} 1' file
Organ   K       ClustNo Analysis
LN      K200    12      Gene Ontology
LN      K200    116     Gene Ontology
CN      K200    2       Gene Ontology

How it works

  • -F'\t'

    Use tab as the field delimiter on input.

  • -v OFS='\t'

    Use tab as the field delimiter on output

  • NR>=2 {sub(/^C/, "", $3)}

    Remove the initial C from field 3 only for lines after the first line.

  • 1

    This is awk's cryptic shorthand for print-the-line.

Using sed

$ sed -r '2,$ s/(([^\t]+\t+){2})C/\1/' file
Organ   K       ClustNo Analysis
LN      K200    12      Gene Ontology
LN      K200    116     Gene Ontology
CN      K200    2       Gene Ontology
  • -r

    Use extended regular expressions. (On Mac OSX or other BSD platform, use -E instead.)

  • 2,$ s/(([^\t]+\t){2})C/\1/

    This substitution is applied only for lines from 2 to the end of the file.

    (([^\t]+\t){2}) matches the first two tab-separated columns. This assumes that only one tab separates each column. Because the regex is enclosed in parens, what it matches will be available later as \1.

    C this match C.

    \1 replaces the matched text with just the first two columns, not the C..

like image 200
John1024 Avatar answered Oct 20 '22 11:10

John1024