Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R : regular expression for 'not followed by' not working

Tags:

regex

r

I needed to retain the words enclosed in brackets and delete the others in the following string.

(a(b(c)d)(e)f)

So what I expected would be (((c))(e)). To delete a, b, d, f, I tried the 'not followed by' regex.

str <- "(a(b(c)d)(e)f)"
gsub("([a-z]+)(?!\\))", "", str) #(sub. anything that isn't followed by a ")" ) 

The message shows my regex in invalid. As I can see, the brackets in the second part of the regex "(?!\))" don't match properly. As for my editor, the first "(" matches with the immediately following ")", which is not meant to be a closure bracket (the one to its right is). I could make out just this error from my regex. Can you please tell me what actually is wrong? Is there any other way to do this?

like image 219
jackson Avatar asked Jun 17 '12 18:06

jackson


2 Answers

In two steps, and using positive lookaheads:

str1 <- gsub("\\([a-z](?=\\()", "\\(", str, perl=TRUE)
str1
# [1] "(((c)d)(e)f)"
str2 <- gsub("\\)[a-z](?=\\))", "\\)", str1, perl=TRUE)
str2
# [1] "(((c))(e))"

Edit: it turns out you can even do it in one:

gsub("([\\(\\)])[a-z](?=\\1)", "\\1", str, perl=TRUE)
# [1] "(((c))(e))"
like image 118
flodel Avatar answered Oct 21 '22 03:10

flodel


I agree with @Dason's comment:

st <- "(a(b(c)d)(e)f)"

while(grepl("\\([a-z]+\\(",st)) {
  st <- sub("\\([a-z]+(\\(.+\\))[a-z]+\\)","\\1",st)
}
> st
[1] "(c)(e)"

Written on my iPad :-)

like image 43
Ari B. Friedman Avatar answered Oct 21 '22 05:10

Ari B. Friedman