I want to remove the last letter "O", except where is is part of the word "HELLO".
I've tried doing this:
Example:
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("[^HELLO]O\\>","",a)
[1] "HELLO " " HELLO" "T " "HO"
but I want
"HELLO X" "D HELLO" "TW X" "H"
Try replacing using the following pattern:
\b(?!HELLO\b)(\w+)O\b
This says to assert that the word HELLO does not appear as the word, and then captures everything up until the final O, should it occur. Then, it replaces with that optional final O removed.
\b - from the start of the word
(?!HELLO\b) - assert that the word is not HELLO
(\w+)O - match a word ending in O, but don't capture final O
\b - end of word
The capture group, if a match happens, will contain the entire word minus the final O.
Code:
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("\\b(?!HELLO\\b)(\\w+)O\\b", "\\1", a, perl=TRUE)
[1] "HELLO X" "D HELLO" "TW X" "H"
Note that we must Perl mode enabled (perl=TRUE) with gsub in order to use the negative lookahead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With