Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: splitting a string between two characters using strsplit()

Tags:

split

r

strsplit

Let's say I have the following string:

s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"

I would like to recover the strings between ";" and "=" to get the following output:

[1] "MIMAT0027618"  "MIMAT0027618"  "hsa-miR-6859-5p"  "MI0022705"

Can I use strsplit() with more than one split element?

like image 503
biohazard Avatar asked Feb 09 '14 14:02

biohazard


People also ask

How do I split a string into characters in R?

To split string in R, use the strsplit() method. The strsplit() method accepts the character or vector string and the character string to split and return the formatted string.

How do I split a string into another character?

You can use String. Split() method with params char[] ; Returns a string array that contains the substrings in this instance that are delimited by elements of a specified Unicode character array.


1 Answers

1) strsplit with matrix Try this:

> matrix(strsplit(s, "[;=]")[[1]], 2)[2,]
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"   

2) strsplit with gsub or this use of strsplit with gsub:

> strsplit(gsub("[^=;]+=", "", s), ";")[[1]]
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"     

3) strsplit with sub or this use of strsplit with sub:

> sub(".*=", "", strsplit(s, ";")[[1]])
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"   

4) strapplyc or this which extracts consecutive non-semicolons after equal signs:

> library(gsubfn)
> strapplyc(s, "=([^;]+)", simplify = unlist)
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"  

ADDED additional strplit solutions.

like image 70
G. Grothendieck Avatar answered Nov 09 '22 06:11

G. Grothendieck