Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Regex: removing only the immediate following character after >

Tags:

regex

r

gsub

I have the following string in R:

string1 = "A((..A>B)A"

I would like to remove all punctation, and the letter immediately after >, i.e. >B

Here is the output I desire:

output = "AAA"

I tried using gsub() as follows:

output = gsub("[[:punct:]]","", string1)

But this gives AABA, which keeps the immediately following character.

like image 715
ShanZhengYang Avatar asked Dec 24 '22 16:12

ShanZhengYang


1 Answers

This would work using your work plus a leading lookbehind first to look for what comes after the > character.

gsub('(?<=>).|[[:punct:]]', '', "A((..A>B)A", perl=TRUE)
## [1] "AAA"
like image 114
Tyler Rinker Avatar answered Jan 19 '23 11:01

Tyler Rinker