Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R strsplit before ( and after ) keeping both delimiters

I have a string that looks like the following:

x <- "01(01)121210(01)0001"

I want to split this into a vector so that i get the following:

[1] "0" "1" "(01)" "1" "2" "1" "2" "1" "0" "(01)" "0" "0" "0" "1"

The (|) could be [|] or {|} and the number of digits between the brackets can be 2 or more.

I've been trying to do this by separating on the brackets first:

unlist(strsplit(x, "(?<=[\\]\\)\\}])", perl=T))
[1] "01(01)" "121210(01)" "0001"

or unlist(strsplit(x, "(?<=[\\[\\(\\{])", perl=T))
[1] "01(" "01)121210(" "01)0001"

but I can't find a way to combine the two together. Then, I was hoping to split the elements not containing the brackets.

I'd be really grateful if someone can help me out with this or know of a more elegant way to do this.

Many thanks!

like image 207
mamboSC4649 Avatar asked Aug 06 '14 12:08

mamboSC4649


2 Answers

Just change the PERL option to TRUE and split the input string based on the below pattern.

(?<!\(|^)(?!\)|\d\)|$)

DEMO

R regex would be,

"(?<!\\(|^)(?!\\)|\\d\\)|$)"
like image 168
Avinash Raj Avatar answered Oct 04 '22 03:10

Avinash Raj


This is another way:

unlist(strsplit(x, '\\([^)]*\\)(*SKIP)(*F)|(?=)', perl=T))
# [1] "0"    "1"    "(01)" "1"    "2"    "1"    "2"    "1"    "0"    "(01)" "0"    "0"    "0"    "1" 

\\([^)]*\\) matches anything in parentheses, and (*SKIP)(*F) tells the regular expression engine to fail on this pattern and if it finds that pattern in the string, do not re-test that part of the string using the alternative pattern on the other side of the |. The pattern on the other side of the | is (?=), and this matches the space between characters.

like image 35
Matthew Plourde Avatar answered Oct 04 '22 01:10

Matthew Plourde