I'm using regular expressions in R. I am trying to pick out parenthesized content that is at the end of some strings in a character vector. I'm able to find parenthesized content when it exists, but I'm failing to excluded non-parenthesized content in inputs that don't have parens.
Example:
> x <- c("DECIMAL", "DECIMAL(14,5)", "RAND(1)")
> gsub("(.*?)(\\(.*\\))", "\\2", x)
[1] "DECIMAL" "(14,5)" "(1)"
The last 2 elements in output are correct, the first one is not. I want
c("", "(14,5)", "(1)")
The input can have anything, literally any word or number characters, before the parenthesized content.
You can use
sub("^.*?(\\(.*\\))?$", "\\1", x, perl=TRUE)
See the regex demo. Details:
^ - start of string.*? - any zero or more chars other than line break chars (since it is a PCRE regex, see perl=TRUE) as few as possible(\\(.*\\))? - an optional Group 1: a (, then any zero or more chars other than line break chars, as many as possible, and then a )$ - end of string.See the R demo:
x <- c("DECIMAL", "DECIMAL(14,5)", "RAND(1)")
sub("^.*?(\\(.*\\))?$", "\\1", x, perl=TRUE)
## => [1] "" "(14,5)" "(1)"
NOTE: perl=TRUE is very important in this case because the two parts in the regex have quantifiers of different greediness.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With