Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

capturing groups in sed

Tags:

I have many lines of the form

ko04062 ko:CXCR3 ko04062 ko:CX3CR1 ko04062 ko:CCL3 ko04062 ko:CCL5 ko04080 ko:GZMA 

and would dearly like to get rid of the ko: bit of the right-hand column. I'm trying to use sed, as follows:

echo "ko05414 ko:ITGA4" | sed 's/\(^ko\d{5}\)\tko:\(.*$\)/\1\2/' 

which simply outputs the original string I echo'd. I'm very new to command line scripting, sed, pipes etc, so please don't be too angry if/when I'm doing something extremely dumb.

The main thing that is confusing me is that the same thing happens if I reverse the \1\2 bit to read \2\1 or just use one group. This, I guess, implies that I'm missing something about the mechanics of piping the output of echo into sed, or that my regexp is wrong or that I'm using sed wrong or that sed isn't printing the results of the substitution.

Any help would be greatly appreciated!

like image 485
Mike Dewar Avatar asked Jul 21 '10 18:07

Mike Dewar


People also ask

How do you use groups in SED?

Grouping can be used in sed like normal regular expression. A group is opened with “\(” and closed with “\)”. Grouping can be used in combination with back-referencing. Back-reference is the re-use of a part of a Regular Expression selected by grouping.

What are capturing groups?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

What is capturing group in regex Javascript?

Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.

What is S in sed command?

Substitution command In some versions of sed, the expression must be preceded by -e to indicate that an expression follows. The s stands for substitute, while the g stands for global, which means that all matching occurrences in the line would be replaced.


2 Answers

sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:

echo "ko05414     ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/' 
  • \d -> [0-9] since GNU sed doesn't recognize \d
  • {} -> \{\} since GNU sed by default uses basic regular expressions.
like image 145
ninjalj Avatar answered Oct 06 '22 00:10

ninjalj


This should do it. You can also skip the last group and simply use, \1 instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? ) but I could not get that to play with sed for whatever reason, perhaps it's not supported.

sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result 

And ofcourse you can use

sed --posix 's/ko://' 
like image 33
Anders Avatar answered Oct 06 '22 01:10

Anders