Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace one substring with different substrings in R?

Tags:

r

I have a vector of strings and I want to replace one common substring in all the strings with different substrings. I'm doing this in R. For example:

input=c("I like fruits","I like you","I like dudes")
# I need to do something like this
newStrings=c("You","We","She")
gsub("I",newStrings,input)

so that the output should look like:

"You like fruits"
"We like you"
"She like dudes"

However, gsub uses only the first string in newStrings. Any suggestions? Thanks

like image 833
Mohammad Avatar asked Aug 04 '16 17:08

Mohammad


1 Answers

You can use stringr:

stringr::str_replace_all(input, "I" ,newStrings)

[1] "You like fruits" "We like you"    
[3] "She like dudes"

or as suggested by @ David Arenburg:

stringi::stri_replace_all_fixed(input, "I", newStrings)

Benchmrk

library(stringi)
library(stringr)
library(microbenchmark)

set.seed(123)
x <- stri_rand_strings(1e3, 10)
y <- stri_rand_strings(1e3, 1)

identical(stringi::stri_replace_all_fixed(x, "I", y), stringr::str_replace_all(x, fixed("I") , y))
# [1] TRUE
identical(stringi::stri_replace_all_fixed(x, "I", y), diag(sapply(y, gsub, pattern = "I", x = x, fixed = TRUE)))
# [1] TRUE
identical(stringi::stri_replace_all_fixed(x, "I", y), mapply(gsub, "I", y, x, USE.NAMES = FALSE, fixed = TRUE))
# [1] TRUE

microbenchmark("stingi: " = stringi::stri_replace_all_fixed(x, "I", y),
               "stringr (optimized): " = stringr::str_replace_all(x, fixed("I") , y),
               "base::mapply (optimized): " = mapply(gsub, "I", y, x, USE.NAMES = FALSE, fixed = TRUE),
               "base::sapply (optimized): " = diag(sapply(y, gsub, pattern = "I", x = x, fixed = TRUE)))

# Unit: microseconds
#                       expr        min          lq        mean      median          uq        max neval cld
#                   stingi:     132.156    137.1165    171.5822    150.3960    194.2345    460.145   100  a 
#      stringr (optimized):     801.894    828.7730    947.1813    912.6095    968.7680   2716.708   100  a 
# base::mapply (optimized):    2827.104   2946.9400   3211.9614   3031.7375   3123.8940   8216.360   100  a 
# base::sapply (optimized):  402349.424 476545.9245 491665.8576 483410.3290 513184.3490 549489.667   100   b
like image 127
Sumedh Avatar answered Sep 30 '22 16:09

Sumedh