Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid for loop in string replacement?

Tags:

r

I've got data, a character vector (eventually I'll collapse it, so I don't care if it stays a vector or if it's treated as a single string), a vector of patterns, and a vector of replacements. I want each pattern in the data to be replaced by its respective replacement. I got it done with a stringr and a for loop, but is there a more R-like way to do it?

require(stringr)
start_string <- sample(letters[1:10], 10)
my_pattern <- c("a", "b", "c", "z")
my_replacement <- c("[this was an a]", "[this was a b]", "[this was a c]", "[no z!]")
str_replace(start_string, pattern = my_pattern, replacement = my_replacement)
# bad lengths, doesn't work

str_replace(paste0(start_string, collapse = ""),
    pattern = my_pattern, replacement = my_replacement)
# vector output, not what I want in this case

my_result <- start_string
for (i in 1:length(my_pattern)) {
    my_result <- str_replace(my_result,
        pattern = my_pattern[i], replacement = my_replacement[i])
}
> my_result
 [1] "[this was a c]"  "[this was an a]" "e"               "g"               "h"               "[this was a b]" 
 [7] "d"               "j"               "f"               "i"   

# This is what I want, but is there a better way?

In my case, I know each pattern will occur at most once, but not every pattern will occur. I know I could use str_replace_all if patterns might occur more than once; I hope a solution would also provide that option. I'd also like a solution that uses my_pattern and my_replacement so that it could be part of a function with those vectors as arguments.

like image 844
Gregor Thomas Avatar asked Oct 05 '22 00:10

Gregor Thomas


2 Answers

I'll bet there's another way to do this, but my first thought was gsubfn:

my_repl <- function(x){
    switch(x,a = "[this was an a]",
             b = "[this was a b]",
             c = "[this was a c]",
             z = "[this was a z]")
}

library(gsubfn)    
start_string <- sample(letters[1:10], 10)
gsubfn("a|b|c|z",my_repl,x = start_string)

If the patterns you are search for a acceptably valid names for list elements, this will also work:

names(my_replacement) <- my_pattern
gsubfn("a|b|c|z",as.list(my_replacement),start_string)

Edit

But frankly, if I really had to do this a lot in my own code, I would probably just do the for loop thing, wrapped in a function. Here's a simple version using sub and gsub rather than the functions from stringr:

vsub <- function(pattern,replacement,x,all = TRUE,...){
  FUN <- if (all) gsub else sub
  for (i in seq_len(min(length(pattern),length(replacement)))){
    x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...)
  }
  x
}

vsub(my_pattern,my_replacement,start_string)

But of course, one of the reasons that there isn't a built-in function for this that's well known is probably that sequential replacements like this can't be pretty fragile, because they are so order dependent:

vsub(rev(my_pattern),rev(my_replacement),start_string)
 [1] "i"                                          "[this w[this was an a]s [this was an a] c]"
 [3] "[this was an a]"                            "g"                                         
 [5] "j"                                          "d"                                         
 [7] "f"                                          "[this w[this was an a]s [this was an a] b]"
 [9] "h"                                          "e"      
like image 195
joran Avatar answered Oct 13 '22 10:10

joran


Here's an option based on gregrexpr, regmatches, and regmatches<-. Do be aware that there are limits to the length of regular expressions that can be matched, so this won't work if you try to match too many long patterns with it.

replaceSubstrings <- function(patterns, replacements, X) {
    pat <- paste(patterns, collapse="|")
    m <- gregexpr(pat, X)
    regmatches(X, m) <- 
        lapply(regmatches(X,m),
               function(XX) replacements[match(XX, patterns)])
    X
}

## Try it out
patterns <- c("cat", "dog")
replacements <- c("tiger", "coyote")
sentences <- c("A cat", "Two dogs", "Raining cats and dogs")
replaceSubstrings(patterns, replacements, sentences)
## [1] "A tiger"                    "Two coyotes"               
## [3] "Raining tigers and coyotes"
like image 29
Josh O'Brien Avatar answered Oct 13 '22 12:10

Josh O'Brien