Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex; eliminate all punctuation except

Tags:

regex

r

strsplit

I have the following regex that splits on any space or punctuation. How can I exclude 1 or more punctuation characters from :punct:? Let's say I'd like to exclude apostrophes and commas. I know I could explicitly use [all punctuation marks in here] instead of [[:punct:]] but I'm hoping for an exclusion method.

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)

 [1] "I"       "'"       "m"       "not"     "that"    "good"    "at"      "regex"   "yet"    
[10] ","       ""        "but"     "am"      "getting" "better"  "!"
like image 442
Tyler Rinker Avatar asked Nov 14 '12 03:11

Tyler Rinker


1 Answers

It's not clear to me what you want the result to be, but you might be able to use negative classes like this answer.

R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
 [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
 [8] "but"     "am"      "getting" "better"  "!"    
like image 90
Joshua Ulrich Avatar answered Oct 31 '22 12:10

Joshua Ulrich