Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selectively removing trailing string

Tags:

regex

r

gsub

I want to remove the last letter "O", except where is is part of the word "HELLO".

I've tried doing this:

Example:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("[^HELLO]O\\>","",a)

[1] "HELLO " " HELLO" "T " "HO"

but I want

"HELLO X" "D HELLO" "TW X" "H"
like image 835
dogged Avatar asked Nov 26 '25 09:11

dogged


1 Answers

Try replacing using the following pattern:

\b(?!HELLO\b)(\w+)O\b

This says to assert that the word HELLO does not appear as the word, and then captures everything up until the final O, should it occur. Then, it replaces with that optional final O removed.

\b          - from the start of the word
(?!HELLO\b) - assert that the word is not HELLO
(\w+)O      - match a word ending in O, but don't capture final O
\b          - end of word

The capture group, if a match happens, will contain the entire word minus the final O.

Code:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("\\b(?!HELLO\\b)(\\w+)O\\b", "\\1", a, perl=TRUE)
[1] "HELLO X" "D HELLO" "TW X"    "H"

Note that we must Perl mode enabled (perl=TRUE) with gsub in order to use the negative lookahead.

Demo

like image 98
Tim Biegeleisen Avatar answered Nov 27 '25 22:11

Tim Biegeleisen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!