Replace string unless between two points

Question

I have the regex need to replace all backslashes \ with \" unless the \ is between two dollar signs $\bar{x}$ . I don't know how to say in regex replace all these unless it falls between these two characters.

Here's a string and a gsub that gets rid og all \ even inside double dollars

x <- c("I like \the big\ red \dog\ $\hat + \bar$, here it is $\bar{x}$",
    "I have $50 to \spend\", "$\frac{4}{5}$ is nice", "$\30\ is nice too") 

gsub("\\", "\"", x)

## > gsub("\\", "\"", x)
## [1] "I like \"the big\" red \"dog\" $\"hat + \"bar$, here it is $\"bar{x}$" 
## [2] "I have $50 to \"spend\""    
## [3] "$\"frac{4}{5}$ is nice"   
## [4] "$\"30\" is nice too"

What I am after is:

## [1] "I like \"the big\" red \"dog\" $\hat + \bar$, here it is $\bar{x}$" 
## [2] "I have $50 to \"spend\""
## [3] "$\frac{4}{5}$ is nice"   
## [4] "$\"30\" is nice too"

mathematical.coffee · Accepted Answer

Using the strsplit method of @FrankieTheKneeMan:

x <- c("I like \the big\ red \dog\ $\hat + \bar$, here it is $\bar{x}$",
       "I have $50 to \spend\",
       "$\frac{4}{5}$ is nice",
       "$\30\ is nice too") 

# > cat(x, sep='
')
# I like 	he big\ red \dog\ $\hat + \bar$, here it is $\bar{x}$
# I have $50 to \spend\
# $\frac{4}{5}$ is nice
# $\30\ is nice too

# split into parts separated by '$'.
# Add a space at the end of every string to deal with '$'
#  at the end of the string (as
#      strsplit('a$', '$', fixed=T)
#  is just 'a' in R)
bits <- strsplit(paste(x, ''), '$', fixed=T)

# apply the regex to every second part (starting with the first)
# and always to the last bit (because of the ' ' we added)
out <- sapply(bits, function (x) {
                   idx <- unique(c(seq(1, length(x), by=2), length(x)))
                   x[idx] <- gsub('\', '\"', x[idx], fixed=T)
                   # join back together
                   x <- paste(x, collapse='$')
                   # remove that last ' ' we added
                   substring(x, 1, nchar(x) - 1)
               }, USE.NAMES=F)

# > cat(out, sep='
')
# I like "the big" red "dog" $\hat + \bar$, here it is $\bar{x}$
# I have $50 to "spend"
# $\frac{4}{5}$ is nice
# $"30" is nice too

This will always have cases in which it fails ("I have $20. \hi\ Now I have $30"), so you will have to keep that in mind and test it against other strings of the format you are expecting.

nhahtdh · Answer

If you ignore the content-dependent problem, then it is possible to do replacement with PCRE regex. (It is possible to patch it on case-by-case basis, if the $ which doesn't denote the portion to preserve \ has a non-ambiguous form).

Assumes that `$` always starts and ends a non-replacement region, except for the case of the odd last `$` in the string.

Pattern (the first line is RAW regex, the second line is quoted string literal):

\G((?:[^$\]|\$[^$]*+\$|\$(?![^$]*+\$))*+)\
"\G((?:[^$\\]|\$[^$]*+\$|\$(?![^$]*+\$))*+)\\"

Replace string:

\1"
"\1\""

DEMO 1
DEMO 2

Explanation

The idea is to find the next \ in the string that is not contained within 2 $. This is achieved by make sure the match always starts from where the last match left off \G, to ensure we don't skip over any literal $ and match the \ inside.

There are 3 forms of sequences that we don't replace:

Is NOT either literal $ or literal \: [^$\]
Any text in between 2 $ (this doesn't take into account escaping mechanism, if any): \$[^$]*+\$
Allow replacement of \ after the odd last $: \$(?![^$]*+\$)

So we just march through any combination of the 3 forms of sequences above, and match the nearest \ for replacement.

Same assumption as above, except that `$<digit>` will not start a non-replacement region.

This will work even with this kind of string:

I have $50 to \spend\. I just $\bar$ remembered that I have another $30 dollars $\left$ from my last \paycheck\. Lone $ \at the end\

Pattern:

\G((?:[^$\]|\$\d|\$(?![^$]*\$)|\$[^$]*+\$)*+)\
"\G((?:[^$\\]|\$\d|\$(?![^$]*\$)|\$[^$]*+\$)*+)\\"

DEMO

\$\d is added in front of the \$[^$]*+\$ in alternation to make the engine check for that case first.

Replace string unless between two points

Tags:

regex

r

Tyler Rinker

2 Answers

mathematical.coffee

Assumes that `$` always starts and ends a non-replacement region, except for the case of the odd last `$` in the string.

Explanation

Same assumption as above, except that `$<digit>` will not start a non-replacement region.

nhahtdh

Recent Activity

Donate For Us

Replace string unless between two points

Tags:

regex

r

Tyler Rinker

2 Answers

mathematical.coffee

Assumes that $ always starts and ends a non-replacement region, except for the case of the odd last $ in the string.

Explanation

Same assumption as above, except that $<digit> will not start a non-replacement region.

nhahtdh

Related questions

Recent Activity

Donate For Us

Assumes that `$` always starts and ends a non-replacement region, except for the case of the odd last `$` in the string.

Same assumption as above, except that `$<digit>` will not start a non-replacement region.