Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract info inside all parenthesis in R

Tags:

regex

r

I have a character string and what to extract the information inside of multiple parentheses. Currently I can extract the information from the last parenthesis with the code below. How would I do it so it extracts multiple parentheses and returns as a vector?

j <- "What kind of cheese isn't your cheese? (wonder) Nacho cheese! (groan) (Laugh)"                                                           sub("\\).*", "", sub(".*\\(", "", j))  

Current output is:

[1] "Laugh" 

Desired output is:

[1] "wonder" "groan"  "Laugh"  
like image 617
Tyler Rinker Avatar asked Dec 23 '11 07:12

Tyler Rinker


1 Answers

Here is an example:

> gsub("[\\(\\)]", "", regmatches(j, gregexpr("\\(.*?\\)", j))[[1]]) [1] "wonder" "groan"  "Laugh"  

I think this should work well:

> regmatches(j, gregexpr("(?=\\().*?(?<=\\))", j, perl=T))[[1]] [1] "(wonder)" "(groan)"  "(Laugh)"  

but the results includes parenthesis... why?

This works:

regmatches(j, gregexpr("(?<=\\().*?(?=\\))", j, perl=T))[[1]] 

Thanks @MartinMorgan for the comment.

like image 194
kohske Avatar answered Sep 23 '22 19:09

kohske