The stringr package has helpful str_replace()
and str_replace_all()
functions. For example
mystring <- "one fish two fish red fish blue fish"
str_replace(mystring, "fish", "dog") # replaces the first occurrence
str_replace_all(mystring, "fish", "dog") # replaces all occurrences
Awesome. But how do you
For the first and last, we can use stri_replace
from stringi
as it has the option
library(stringi)
stri_replace(mystring, fixed="fish", "dog", mode="first")
#[1] "one dog two fish red fish blue fish"
stri_replace(mystring, fixed="fish", "dog", mode="last")
#[1] "one fish two fish red fish blue dog"
The mode
can only have values 'first', 'last' and 'all'. So, other options are not in the default function. We may have to use regex
option to change it.
Using sub
, we can do the nth replacement of word
sub("^((?:(?!fish).)*fish(?:(?!fish).)*)fish",
"\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"
Or we can use
sub('^((.*?fish.*?){2})fish', "\\1\\dog", mystring, perl=TRUE)
#[1] "one fish two fish red dog blue fish"
Just for easiness, we can create a function to do this
patfn <- function(n){
stopifnot(n>1)
sprintf("^((.*?\\bfish\\b.*?){%d})\\bfish\\b", n-1)
}
and replace the nth occurrence of 'fish' except the first one which can be easily done using sub
or the default option in str_replace
sub(patfn(2), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"
sub(patfn(3), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red dog blue fish"
sub(patfn(4), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red fish blue dog"
This should also work with str_replace
str_replace(mystring, patfn(2), "\\1dog")
#[1] "one fish two dog red fish blue fish"
str_replace(mystring, patfn(3), "\\1dog")
#[1] "one fish two fish red dog blue fish"
Based on the pattern/replacement mentioned above, we can create a new function to do most of the options
replacerFn <- function(String, word, rword, n){
stopifnot(n >0)
pat <- sprintf(paste0("^((.*?\\b", word, "\\b.*?){%d})\\b",
word,"\\b"), n-1)
rpat <- paste0("\\1", rword)
if(n >1) {
stringr::str_replace(String, pat, rpat)
} else {
stringr::str_replace(String, word, rword)
}
}
replacerFn(mystring, "fish", "dog", 1)
#[1] "one dog two fish red fish blue fish"
replacerFn(mystring, "fish", "dog", 2)
#[1] "one fish two dog red fish blue fish"
replacerFn(mystring, "fish", "dog", 3)
#[1] "one fish two fish red dog blue fish"
replacerFn(mystring, "fish", "dog", 4)
#[1] "one fish two fish red fish blue dog"
A useful answer depends a lot on the string and what you know about it. With regex, one option is to build a regex that matches the whole line, but in different pieces, so you can put the pieces you like back in:
str_replace(mystring, '(^.*?fish.*?)(fish)(.*?fish.*)', '\\1dog\\3')
# [1] "one fish two dog red fish blue fish"
where the \\1
and \\3
in the replacement match the first and third parentheses captured, respectively. Note the lazy (ungreedy) quantifiers *?
, which are important so you don't overmatch.
You can do the same thing to match the third or fourth occurrence, of course:
str_replace(mystring, '(^.*?fish.*?fish.*?)(fish)(.*)', '\\1dog\\3')
# [1] "one fish two fish red dog blue fish"
str_replace(mystring, '(^.*?fish.*?fish.*?fish.*?)(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"
This is not tremendously efficient, though. You can use quantifiers to repeat, but they make numbering the replacement groups a little confusing:
str_replace(mystring, '^((.*?fish.*?){3})(fish)(.*?)', '\\1dog\\4')
# [1] "one fish two fish red fish blue dog"
but if you make the repeated group non-capturing (?: ... )
, it makes more sense:
str_replace(mystring, '^((?:.*?fish.*?){3})(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"
All of this is a lot of regex, though. A simpler option (depending on the context and how much you like regex, I suppose) may be to use strsplit
and then recombine, collapse
ing separately:
mystrlist <- strsplit(mystring, 'fish ')[[1]] # match the space so not the last "fish$"
paste0(c(mystrlist[1],
paste0(mystrlist[2:3], collapse = 'dog '),
mystrlist[4]),
collapse = 'fish ')
# [1] "one fish two dog red fish blue fish"
paste0(c(mystrlist[1:2],
paste0(mystrlist[3:4], collapse = 'dog ')),
collapse = 'fish ')
# [1] "one fish two fish red dog blue fish"
This doesn't work terribly well for the last word, of course, but the end-of-line regex token $
makes using str_replace
(or just sub
) really easy for that purpose:
sub('fish$', 'dog', mystring)
# [1] "one fish two fish red fish blue dog"
Bottom line: It depends a lot on the context what the best choice is, but there is not an extra parameter for which match to replace, sadly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With