Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using R to parse and return text in parenthesis

Tags:

r

Let's say I have a string:

x <- "This is a string (Yay, string!)" 

I'd like to parse the string and return "Yay, string!"

How do I do that?

I tried a bunch of grep/grepl/gsub/sub/etc but couldn't find the right combination of regex or arguments. Sigh. I need to work on the regex skills.

like image 342
Brandon Bertelsen Avatar asked Sep 10 '12 21:09

Brandon Bertelsen


2 Answers

Here are two ways of doing it:

One: Find the string you want, and replace the entire string with the bit that was found. (Known as back referencing)

gsub(".*\\((.*)\\).*", "\\1", x)
[1] "Yay, string!"

This works because:

  • You use a backreference \\1 to refer to the matched string in the parentheses (.*)
  • Since you want to exclude the parentheses in the actual string, you need to escape these with \\( and \\).

Two: Replace all the bits you don't want with empty strings:

gsub(".*\\(|\\).*", "", x)
[1] "Yay, string!"

This works because the | acts similar to OR.

like image 70
Andrie Avatar answered Sep 24 '22 02:09

Andrie


Also, if some of your strings might contain several parenthesized substrings, all of which you want to extract, use the regex power-tools gregexpr() and regmatches():

x <- "This is (a) string (Yay, string!)" 
pat <- "(?<=\\()([^()]*)(?=\\))"
regmatches(x, gregexpr(pat, x, perl=TRUE))
# [[1]]
# [1] "a"            "Yay, string!"
like image 21
Josh O'Brien Avatar answered Sep 25 '22 02:09

Josh O'Brien