does someone know how to find the n-th occurcence of a string within an expression and how to replace it by regular expression?
for example I have the following string
txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"
and I want to replace the 5th occurence of '-' by '|' and the 7th occurence of '-' by "||" like
[1] aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa
How do I do this?
Thanks, Florian
(1) sub It can be done in a single regular expression with sub
:
> sub("(^(.*?-){4}.*?)-(.*?-.*?)-", "\\1|\\3||", txt, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(2) sub twice or this variation which calls sub
twice:
> txt2 <- sub("(^(.*?-){6}.*?)-", "\\1|", txt, perl = TRUE)
> sub("(^(.*?-){4}.*?)-", "\\1||", txt2, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(3) sub.fun or this variation which creates a function sub.fun
which does one substitute. it makes use of fn$
from the gsubfn package to substitute n-1
, pat
, and value
into the sub
arguments. First define the indicated function and then call it twice.
library(gsubfn)
sub.fun <- function(x, pat, n, value) {
fn$sub( "(^(.*?-){`n-1`}.*?)$pat", "\\1$value", x, perl = TRUE)
}
> sub.fun(sub.fun(txt, "-", 7, "||"), "-", 5, "|")
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(We could have modified the arguments to sub
in the body of sub.fun
using paste
or sprintf
to give a base R solution but at the expense of some additional verbosity.)
This can be reformulated as a replacement function giving this pleasing sequence:
"sub.fun<-" <- sub.fun
tt <- txt # make a copy so that we preserve the input txt
sub.fun(tt, "-", 7) <- "||"
sub.fun(tt, "-", 5) <- "|"
> tt
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(4) gsubfn Using gsubfn
from the gsubfn package we can use a particularly simple regular expression (its just "-"
) and the code has quite a straight forward structure. We perform the substitution via a proto method. The proto object containing the method is passed in place of a replacement string. The simplicity of this approach derives fron the fact that gsubfn
automatically makes a count
variable available to such methods:
library(gsubfn) # gsubfn also pulls in proto
p <- proto(fun = function(this, x) {
if (count == 5) return("|")
if (count == 7) return("||")
x
})
> gsubfn("-", p, txt)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
UPDATE: Some corrections.
UPDATE 2: Added a replacement function approach to (3).
UPDATE 3: Added pat
argument to sub.fun
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With