Let me use the following example to illustrate.
str = "we are friends"
The help doc says that
The symbols \< and \> match the empty string at the beginning and end of a word.
So, the following is expected to happen, where a whitespace is added to the end of each word.
gsub("\\>"," ", str)
[1] "we are friends "
However, why it won't work when using
gsub("\\<"," ", str)
[1] " w e a r e f r i e n d s"
Can some explain why this happens? and what I need to do if I want an extra whitespace added in the front of every word?
It is pretty strange but I think this is documented as a warning:
POSIX 1003.2 mode of
gsub
andgregexpr
does not work correctly with repeated word-boundaries (e.g.,pattern = "\b"
). Useperl = TRUE
for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).
So, use \\b(?=\\w)
or (?<!\\w)\\b
with perl=T
:
str = "we are friends"
gsub('(?<!\\w)\\b', ' ', str, perl=T)
See demo
Output: [1] " we are friends"
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With