Given a string, I need to make many substitutions for different patterns:
subst <- fread(c("
regex ; replacement
abc\\w*\\b ; alphabet
red ; color
\\d+ ; number
")
, sep = ";"
)
> subst
regex replacement
1: abc\\w*\\b alphabet
2: red color
3: \\d+ number
So, for string text <- c( "abc 24 red bcd"), the expected output would be:
alphabet number color bcd
I tried the follwing code:
mapply(function(x,y) gsub(x, y, text, perl = T)
, subst$regex
, subst$replacement
)
The output I got:
"alphabet 24 red bcd" "abc 24 color bcd" "abc number red bcd"
This code performs each substitution one at a time, and not all at once. What should I do to get the expected result?
You can perform multiple substitutions by passing a named character vector to stringr::str_replace_all().
library(stringr)
str_replace_all(text, setNames(subst$replacement, subst$regex))
# "alphabet number color bcd"
As an alternative to setNames(), you could convert your table to a named vector using tibble::deframe().
library(stringr)
library(tibble)
str_replace_all(text, deframe(subst))
# "alphabet number color bcd"
I think zephryl's answer is a great one-step.
The reason your mapply solution doesn't work is that on each iteration, it works on the value of text at the time it was started, it does not do the work on the results from the previous replacement.
For that, we can use Reduce:
Reduce(function(txt, i) gsub(subst$regex[i], subst$replacement[i], txt, perl = TRUE),
seq_len(nrow(subst)), init = text)
# [1] "alphabet number color bcd"
We can see what's happening step-by-step by adding accumulate=TRUE:
Reduce(function(txt, i) gsub(subst$regex[i], subst$replacement[i], txt, perl = TRUE),
seq_len(nrow(subst)), init = text, accumulate = TRUE)
# [1] "abc 24 red bcd" "alphabet 24 red bcd"
# [3] "alphabet 24 color bcd" "alphabet number color bcd"
In fact, based on @thelatemail's recent comment and link, they provided an answer nearly identical to this in 2014. The only difference is how it deals with a reduction over two vectors (the two columns of subst). Both methods work equally well, use which one reads more easily to you:
Reduce(function(txt, ptn) gsub(ptn[1], ptn[2], txt, perl = TRUE),
Map(c, subst$regex, subst$replacement), init = text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With