Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression: replace the n-th occurence

Tags:

r

does someone know how to find the n-th occurcence of a string within an expression and how to replace it by regular expression?

for example I have the following string

txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"

and I want to replace the 5th occurence of '-' by '|' and the 7th occurence of '-' by "||" like

[1] aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa

How do I do this?

Thanks, Florian

like image 226
floe Avatar asked May 28 '13 10:05

floe


1 Answers

(1) sub It can be done in a single regular expression with sub:

> sub("(^(.*?-){4}.*?)-(.*?-.*?)-", "\\1|\\3||", txt, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(2) sub twice or this variation which calls sub twice:

> txt2 <- sub("(^(.*?-){6}.*?)-", "\\1|", txt, perl = TRUE)
> sub("(^(.*?-){4}.*?)-", "\\1||", txt2, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(3) sub.fun or this variation which creates a function sub.fun which does one substitute. it makes use of fn$ from the gsubfn package to substitute n-1, pat, and value into the sub arguments. First define the indicated function and then call it twice.

library(gsubfn)
sub.fun <- function(x, pat, n, value) {
   fn$sub( "(^(.*?-){`n-1`}.*?)$pat", "\\1$value", x, perl = TRUE)
}

> sub.fun(sub.fun(txt, "-", 7, "||"), "-", 5, "|")
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(We could have modified the arguments to sub in the body of sub.fun using paste or sprintf to give a base R solution but at the expense of some additional verbosity.)

This can be reformulated as a replacement function giving this pleasing sequence:

"sub.fun<-" <- sub.fun
tt <- txt # make a copy so that we preserve the input txt
sub.fun(tt, "-", 7) <- "||"
sub.fun(tt, "-", 5) <- "|"

> tt
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(4) gsubfn Using gsubfn from the gsubfn package we can use a particularly simple regular expression (its just "-") and the code has quite a straight forward structure. We perform the substitution via a proto method. The proto object containing the method is passed in place of a replacement string. The simplicity of this approach derives fron the fact that gsubfn automatically makes a count variable available to such methods:

library(gsubfn) # gsubfn also pulls in proto
p <- proto(fun = function(this, x) {
     if (count == 5) return("|")
     if (count == 7) return("||")
     x
 })

> gsubfn("-", p, txt)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

UPDATE: Some corrections.

UPDATE 2: Added a replacement function approach to (3).

UPDATE 3: Added pat argument to sub.fun.

like image 117
G. Grothendieck Avatar answered Nov 15 '22 06:11

G. Grothendieck