Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using strsplit() in R, ignoring anything in parentheses

Tags:

regex

r

strsplit

I'm trying to use strsplit() in R to break a string into pieces based on commas, but I don't want to split up anything in parentheses. I think the answer is a regex but I'm struggling to get the code right.

So for example:

x <- "This is it, isn't it (well, yes)"
> strsplit(x, ", ")
[[1]]
[1] "This is it"     "isn't it (well" "yes)" 

When what I would like is:

[1] "This is it"     "isn't it (well, yes)"
like image 256
John Smith Avatar asked Feb 11 '16 18:02

John Smith


1 Answers

We can use PCRE regex to FAIL any , that follows that a ( before the ) and split by , followed by 0 or more space (\\s*)

 strsplit(x, '\\([^)]+,(*SKIP)(*FAIL)|,\\s*', perl=TRUE)[[1]]
 #[1] "This is it"           "isn't it (well, yes)"
like image 123
akrun Avatar answered Oct 05 '22 17:10

akrun