I am trying to replace commas within all sets of parentheses with a semicolon, but not change any commas outside of the parentheses.
So, for example:
"a, b, c (1, 2, 3), d, e (4, 5)"
should become:
"a, b, c (1; 2; 3), d, e (4; 5)"
I have started attempting this with gsub, but I am having a really hard time understanding/figuring out what how to identify those commas within the parentheses.
I would call myself an advanced beginner with R, but with regular expressions and text manipulations, a total noob. Any help you can provide would be great.
The simplest solution
A most common workaround that will work in case all parentheses are balanced:
,(?=[^()]*\))
See the regex demo. R code:
a <- "a, b, c (1, 2, 3), d, e (4, 5)"
gsub(",(?=[^()]*\\))", ";", a, perl=T)
## [1] "a, b, c (1; 2; 3), d, e (4; 5)"
See IDEONE demo
The regex matches...
, - a comma if...(?=[^()]*\)) - it is followed by 0 or more characters other than ( or ) (with [^()]*) and a literal ).Alternative solutions
If you need to make sure only commas inside the closest open and close parentheses are replaced, it is safer to use a gsubfn based approach:
library(gsubfn)
x <- 'a, b, c (1, 2, 3), d, e (4, 5)'
gsubfn('\\(([^()]*)\\)', function(match) gsub(',', ';', match, fixed=TRUE), x, backref=0)
## => [1] "a, b, c (1; 2; 3), d, e (4; 5)"
Here, \(([^()]*)\) matches (, then 0+ chars other than ( and ) and then ), and after that the match found is passed to the anonymous function where all , chars are replaced with semi-colons using gsub.
If you need to perform this replacement inside balanced parentheses with unknown level depth use a PCRE regex with gsubfn:
x1 <- 'a, b, c (1, (2, (3, 4)), 5), d, e (4, 5)'
gsubfn('\\(((?:[^()]++|(?R))*)\\)', function(match) gsub(',', ';', match, fixed=TRUE), x1, backref=0, perl=TRUE)
## => [1] "a, b, c (1; (2; (3; 4)); 5), d, e (4; 5)"
Pattern details
\( # Open parenthesis
( # Start group 1
(?: # Start of a non-capturing group:
[^()]++ # Any 1 or more chars other than '(' and ')'
| # OR
(?R) # Recursively match the entire pattern
)* # End of the non-capturing group and repeat it zero or more times
) # End of Group 1 (its value will be passed to the `gsub` via `match`)
\) # A literal ')'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With