Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all punctuation except underline between characters in R with POSIX character class

Tags:

posix

r

gsub

I would like to use R to remove all underlines expect those between words. At the end the code removes underlines at the end or at the beginning of a word. The result should be 'hello_world and hello_world'. I want to use those pre-built classes. Right know I have learn to expect particular characters with following code but I don't know how to use the word boundary sequences.

test<-"hello_world and _hello_world_"
gsub("[^_[:^punct:]]", "", test, perl=T)
like image 371
rupi42 Avatar asked Dec 01 '25 11:12

rupi42


1 Answers

You can use

gsub("[^_[:^punct:]]|_+\\b|\\b_+", "", test, perl=TRUE)

See the regex demo

Details:

  • [^_[:^punct:]] - any punctuation except _
  • | - or
  • _+\b - one or more _ at the end of a word
  • | - or
  • \b_+ - one or more _ at the start of a word
like image 134
Wiktor Stribiżew Avatar answered Dec 04 '25 01:12

Wiktor Stribiżew