Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex in R doesn't work as it does somewhere else

Tags:

regex

r

I have a sentence and a query as on the image. It works as I want. But I really stuck while porting it to R.

enter image description here

My R query is: gsub("\\S*[^[:alnum:]\\s\\?\\!\",();:\\.'\\/-]+\\S*", "", x)

and it cuts everything. I can't find my error. Even the shorter with alnum: "\\S*[^[:alnum:]]+\\S*" cuts everything. I don't understand. Please help.

like image 813
Peter.k Avatar asked Nov 28 '25 21:11

Peter.k


1 Answers

You cannot use \s shorthand class in the TRE bracket expression, replace with [:space:], and unescape all the other "special" chars because you should not escape them either (they already match literal symbols).

pat <- "\\S*[^[:alnum:][:space:]?!\",();:.'/-]+\\S*"
x <- "But what's about in a interacting QFT a 2-particla state in the far past: $|E_{\\bf p_1}, {\\bf p_1}, E_{\\bf p_2} {\\bf p_2}&gt;$ which undergoes"
gsub(pat, "", x)

Note that even gsub(pat, "", x, perl=TRUE) will also work.

See the R demo

like image 83
Wiktor Stribiżew Avatar answered Nov 30 '25 10:11

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!