Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use stringr's replace_all() function to replace specific matches in a string

Tags:

regex

r

stringr

The stringr package has helpful str_replace() and str_replace_all() functions. For example

mystring <- "one fish two fish red fish blue fish"

str_replace(mystring, "fish", "dog") # replaces the first occurrence
str_replace_all(mystring, "fish", "dog") # replaces all occurrences

Awesome. But how do you

  1. Replace the 2nd occurrence of "fish"?
  2. Replace the last occurrence of "fish"?
  3. Replace the 2nd to last occurrence of "fish"?
like image 985
Ben Avatar asked Apr 02 '16 03:04

Ben


2 Answers

For the first and last, we can use stri_replace from stringi as it has the option

 library(stringi)
 stri_replace(mystring, fixed="fish", "dog", mode="first")
 #[1] "one dog two fish red fish blue fish"

 stri_replace(mystring, fixed="fish", "dog", mode="last")
 #[1] "one fish two fish red fish blue dog"

The mode can only have values 'first', 'last' and 'all'. So, other options are not in the default function. We may have to use regex option to change it.

Using sub, we can do the nth replacement of word

sub("^((?:(?!fish).)*fish(?:(?!fish).)*)fish", 
           "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"

Or we can use

 sub('^((.*?fish.*?){2})fish', "\\1\\dog", mystring, perl=TRUE)
 #[1] "one fish two fish red dog blue fish"

Just for easiness, we can create a function to do this

patfn <- function(n){
 stopifnot(n>1)
 sprintf("^((.*?\\bfish\\b.*?){%d})\\bfish\\b", n-1)
} 

and replace the nth occurrence of 'fish' except the first one which can be easily done using sub or the default option in str_replace

sub(patfn(2), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"
sub(patfn(3), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red dog blue fish"
sub(patfn(4), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red fish blue dog"

This should also work with str_replace

 str_replace(mystring, patfn(2), "\\1dog")
 #[1] "one fish two dog red fish blue fish"
 str_replace(mystring, patfn(3), "\\1dog")
 #[1] "one fish two fish red dog blue fish"

Based on the pattern/replacement mentioned above, we can create a new function to do most of the options

replacerFn <- function(String, word, rword, n){
 stopifnot(n >0)
  pat <- sprintf(paste0("^((.*?\\b", word, "\\b.*?){%d})\\b",
           word,"\\b"), n-1)
  rpat <- paste0("\\1", rword)
  if(n >1) { 
    stringr::str_replace(String, pat, rpat)
   } else {
    stringr::str_replace(String, word, rword)
    }
 }


 replacerFn(mystring, "fish", "dog", 1)
 #[1] "one dog two fish red fish blue fish"
 replacerFn(mystring, "fish", "dog", 2)
 #[1] "one fish two dog red fish blue fish"
 replacerFn(mystring, "fish", "dog", 3)
 #[1] "one fish two fish red dog blue fish"
 replacerFn(mystring, "fish", "dog", 4)
 #[1] "one fish two fish red fish blue dog"
like image 148
akrun Avatar answered Oct 23 '22 03:10

akrun


A useful answer depends a lot on the string and what you know about it. With regex, one option is to build a regex that matches the whole line, but in different pieces, so you can put the pieces you like back in:

str_replace(mystring, '(^.*?fish.*?)(fish)(.*?fish.*)', '\\1dog\\3')
# [1] "one fish two dog red fish blue fish"

where the \\1 and \\3 in the replacement match the first and third parentheses captured, respectively. Note the lazy (ungreedy) quantifiers *?, which are important so you don't overmatch.

You can do the same thing to match the third or fourth occurrence, of course:

str_replace(mystring, '(^.*?fish.*?fish.*?)(fish)(.*)', '\\1dog\\3')
# [1] "one fish two fish red dog blue fish"
str_replace(mystring, '(^.*?fish.*?fish.*?fish.*?)(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"

This is not tremendously efficient, though. You can use quantifiers to repeat, but they make numbering the replacement groups a little confusing:

str_replace(mystring, '^((.*?fish.*?){3})(fish)(.*?)', '\\1dog\\4')
# [1] "one fish two fish red fish blue dog"

but if you make the repeated group non-capturing (?: ... ), it makes more sense:

str_replace(mystring, '^((?:.*?fish.*?){3})(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"

All of this is a lot of regex, though. A simpler option (depending on the context and how much you like regex, I suppose) may be to use strsplit and then recombine, collapseing separately:

mystrlist <- strsplit(mystring, 'fish ')[[1]] # match the space so not the last "fish$"
paste0(c(mystrlist[1], 
         paste0(mystrlist[2:3], collapse = 'dog '), 
         mystrlist[4]), 
       collapse = 'fish ')
# [1] "one fish two dog red fish blue fish"

paste0(c(mystrlist[1:2], 
         paste0(mystrlist[3:4], collapse = 'dog ')), 
       collapse = 'fish ')
# [1] "one fish two fish red dog blue fish"

This doesn't work terribly well for the last word, of course, but the end-of-line regex token $ makes using str_replace (or just sub) really easy for that purpose:

sub('fish$', 'dog', mystring)
# [1] "one fish two fish red fish blue dog"

Bottom line: It depends a lot on the context what the best choice is, but there is not an extra parameter for which match to replace, sadly.

like image 34
alistaire Avatar answered Oct 23 '22 04:10

alistaire