R- regex extracting a string between a dash and a period

Question

First of all I apologize if this question is too naive or has been repeated earlier. I tried to find it in the forum but I'm posting it as a question because I failed to find an answer.

I have a data frame with column names as follows;

head(rownames(u))

[1] "A17-R-Null-C-3.AT2G41240"       "A18-R-Null-C-3.AT2G41240"         "B19-R-Null-C-3.AT2G41240"      
[4] "B20-R-Null-C-3.AT2G41240"       "A21-R-Transgenic-C-3.AT2G41240" "A22-R-Transgenic-C-3.AT2G41240"

What I want is to use regex in R to extract the string in between the first dash and the last period.

Anticipated results are,

[1] "R-Null-C-3"       "R-Null-C-3"         "R-Null-C-3"      
[4] "R-Null-C-3"       "R-Transgenic-C-3" "R-Transgenic-C-3"

I tried following with no luck...

gsub("^[^-]*-|.+\.","\2", rownames(u))
gsub("^.+-","", rownames(u))
sub("^[^-]*.|\..","", rownames(u))

Would someone be able to help me with this problem?

Thanks a lot in advance.

Shani.

Wiktor Stribiżew · Accepted Answer

Here is a solution to be used with gsub:

v <- c("A17-R-Null-C-3.AT2G41240", "A18-R-Null-C-3.AT2G41240", "B19-R-Null-C-3.AT2G41240", "B20-R-Null-C-3.AT2G41240", "A21-R-Transgenic-C-3.AT2G41240", "A22-R-Transgenic-C-3.AT2G41240")
gsub("^[^-]*-([^.]+).*", "\1", v)

See IDEONE demo

The regex matches:

^[^-]* - zero or more characters other than -
- - a hyphen
([^.]+) - Group 1 matching and capturing one or more characters other than a dot
.* - any characters (even including a newline since perl=T is not used), any number of occurrences up to the end of the string.

R- regex extracting a string between a dash and a period

Tags:

regex

r

Shani A.

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

R- regex extracting a string between a dash and a period

Tags:

regex

r

Shani A.

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us