Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract substring between "-" and "-" in string in R

i have a list of string that looks like this:

list=["chr21-10139833-A-C","chry-10139832-b-f"]

for every string in the list i need to extract the numbers between "-" and "-"

so i would get:

[10139833,10139832]

i tried this :

gsub(".*[-]([^-]+)[-]", "\\1", list

but it returns :

[ac,bf]

what can i do to make it work ? thank you

like image 728
agnesa rivkin Avatar asked Oct 19 '25 08:10

agnesa rivkin


2 Answers

Using str_extract from stringr we can try:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- str_extract(list, "(?<=-)(\\d+)(?=-)")
nums

[1] "10139833" "10139832"

We could also use sub for a base R option:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- sub(".*-(\\d+).*", "\\1", list)
nums

[1] "10139833" "10139832"
like image 199
Tim Biegeleisen Avatar answered Oct 20 '25 23:10

Tim Biegeleisen


1) Using the input shown in the Note at the end, use read.table. If you want character output instead add colClasses = "character" argument to read.table .

read.table(text = x, sep = "-")[[2]]
## [1] 10139833 10139832

2) Another possibility is to use strapply. If you want character output then omit the as.numeric argument.

library(gsubfn)
strapply(x, "-(\\d+)-", as.numeric, simplify = TRUE)
## [1] 10139833 10139832

Note

x <- c("chr21-10139833-A-C", "chry-10139832-b-f")
like image 27
G. Grothendieck Avatar answered Oct 20 '25 23:10

G. Grothendieck