Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace text within parenthesis in R [duplicate]

Tags:

r

Possible Duplicate:
Remove text inside brackets, parens, and/or braces

I would like to replace the parenthesis and the text between parenthesis in a large text file.

Example input (content in the text file):

Keep me (Remove Me 1). Again keep me (Remove Me 2). Again again keep me (Remove Me 3).

Output (content in a new text file):

Keep me. Again keep me. Again again keep me. 

Is it possible to do this in R (say using grep)?

like image 799
user961932 Avatar asked Nov 23 '12 12:11

user961932


1 Answers

Yes, use gsub() to replace all the text you don't want with an empty string.

x <- "Keep me (Remove Me 1). Again keep me (Remove Me 2). Again again keep me (Remove Me 3)."

Here is the regex you want:

gsub( " *\\(.*?\\) *", "", x)
[1] "Keep me. Again keep me. Again again keep me."

It works like this:

  • *? finds 0 or more spaces before (and after) the parentheses.
  • Since ( and ) are special symbols in a regex, you need to escape these, i.e. (\\(
  • The .*? is a wildcard find to find all characters, where the ? means to find in a non-greedy way. This is necessary because regex is greedy by default. In other words, by default the regex will start the match at the first opening parentheses and ends the match at the last closing parentheses.
like image 106
Andrie Avatar answered Sep 30 '22 03:09

Andrie