I have the regex need to replace all backslashes \\
with \"
unless the \\
is between two dollar signs $\\bar{x}$
. I don't know how to say in regex replace all these unless it falls between these two characters.
Here's a string and a gsub
that gets rid og all \\
even inside double dollars
x <- c("I like \\the big\\ red \\dog\\ $\\hat + \\bar$, here it is $\\bar{x}$",
"I have $50 to \\spend\\", "$\\frac{4}{5}$ is nice", "$\\30\\ is nice too")
gsub("\\\\", "\"", x)
## > gsub("\\\\", "\"", x)
## [1] "I like \"the big\" red \"dog\" $\"hat + \"bar$, here it is $\"bar{x}$"
## [2] "I have $50 to \"spend\""
## [3] "$\"frac{4}{5}$ is nice"
## [4] "$\"30\" is nice too"
What I am after is:
## [1] "I like \"the big\" red \"dog\" $\\hat + \\bar$, here it is $\\bar{x}$"
## [2] "I have $50 to \"spend\""
## [3] "$\\frac{4}{5}$ is nice"
## [4] "$\"30\" is nice too"
Using the strsplit method of @FrankieTheKneeMan:
x <- c("I like \\the big\\ red \\dog\\ $\\hat + \\bar$, here it is $\\bar{x}$",
"I have $50 to \\spend\\",
"$\\frac{4}{5}$ is nice",
"$\\30\\ is nice too")
# > cat(x, sep='\n')
# I like \the big\ red \dog\ $\hat + \bar$, here it is $\bar{x}$
# I have $50 to \spend\
# $\frac{4}{5}$ is nice
# $\30\ is nice too
# split into parts separated by '$'.
# Add a space at the end of every string to deal with '$'
# at the end of the string (as
# strsplit('a$', '$', fixed=T)
# is just 'a' in R)
bits <- strsplit(paste(x, ''), '$', fixed=T)
# apply the regex to every second part (starting with the first)
# and always to the last bit (because of the ' ' we added)
out <- sapply(bits, function (x) {
idx <- unique(c(seq(1, length(x), by=2), length(x)))
x[idx] <- gsub('\\', '\"', x[idx], fixed=T)
# join back together
x <- paste(x, collapse='$')
# remove that last ' ' we added
substring(x, 1, nchar(x) - 1)
}, USE.NAMES=F)
# > cat(out, sep='\n')
# I like "the big" red "dog" $\hat + \bar$, here it is $\bar{x}$
# I have $50 to "spend"
# $\frac{4}{5}$ is nice
# $"30" is nice too
This will always have cases in which it fails ("I have $20. \\hi\\ Now I have $30"
), so you will have to keep that in mind and test it against other strings of the format you are expecting.
If you ignore the content-dependent problem, then it is possible to do replacement with PCRE regex. (It is possible to patch it on case-by-case basis, if the $
which doesn't denote the portion to preserve \
has a non-ambiguous form).
$
always starts and ends a non-replacement region, except for the case of the odd last $
in the string.Pattern (the first line is RAW regex, the second line is quoted string literal):
\G((?:[^$\\]|\$[^$]*+\$|\$(?![^$]*+\$))*+)\\
"\\G((?:[^$\\\\]|\\$[^$]*+\\$|\\$(?![^$]*+\\$))*+)\\\\"
Replace string:
\1"
"\\1\""
DEMO 1
DEMO 2
The idea is to find the next \
in the string that is not contained within 2 $
. This is achieved by make sure the match always starts from where the last match left off \G
, to ensure we don't skip over any literal $
and match the \
inside.
There are 3 forms of sequences that we don't replace:
$
or literal \
: [^$\\]
$
(this doesn't take into account escaping mechanism, if any): \$[^$]*+\$
\
after the odd last $
: \$(?![^$]*+\$)
So we just march through any combination of the 3 forms of sequences above, and match the nearest \
for replacement.
$<digit>
will not start a non-replacement region.This will work even with this kind of string:
I have $50 to \spend\. I just $\bar$ remembered that I have another $30 dollars $\left$ from my last \paycheck\. Lone $ \at the end\
Pattern:
\G((?:[^$\\]|\$\d|\$(?![^$]*\$)|\$[^$]*+\$)*+)\\
"\\G((?:[^$\\\\]|\\$\\d|\\$(?![^$]*\\$)|\\$[^$]*+\\$)*+)\\\\"
DEMO
\$\d
is added in front of the \$[^$]*+\$
in alternation to make the engine check for that case first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With