I have a string like this:
vect <- c("Thin lines are not great, I am in !!! AND You shouldn't be late OR you loose")
I want to replace, "in" to %in%", "AND" to "&", "OR" to "|".
I know this can be done using gsub like below:
gsub("\\bin\\b","%in%", vect),
but I need three different lines for each of the replacement, hence I choose to use gsubfn
.
so I tried,
gsubfn("\\bin\\b|\\bAND\\b|\\bOR\\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
but It returns a string with nothing changed, for some reason \\b
is not working for the string. However, \\b
does work great with gsub
and I am able to replace all the three strings in by piping together using gsub
.
My question is, why \\b
is not working inside gsubfn
. what I am missing inside my regex?
Please help.
Output should be:
"Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
This works:
gsubfn("\\w+", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
By default, Tcl regex engine is used, see gsubfn
docs:
If the R installation has tcltk capability then the tcl engine is used unless FUN is a proto object or
perl=TRUE
in which case the "R" engine is used (regardless of the setting of this argument).
So, word boundaries are defined with \y
:
> gsubfn("\\y(in|AND|OR)\\y", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
Ainother way is by using \m
as leading word boundary and \M
for a trailing word boundary:
> gsubfn("\\m(in|AND|OR)\\M", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
You may pass perl=TRUE
and use \b
:
> gsubfn("\\b(in|AND|OR)\\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl=TRUE)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
Add perl = T
that should do it.
gsubfn("\\bin\\b|\\bAND\\b|\\bOR\\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl =T)
Output
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
From gsub documentation
The POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).
And gsubfn documentation
... Other gsub arguments.
Doesn't explain why gsub works fine without the perl
argument, but to do gsubfn it needs the perl=T
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With