Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regex remove all punctuation except apostrophe [duplicate]

Tags:

regex

r

I'm trying to remove all punctuation from a string except apostrophes. Here's my exastr2 <-

str2 <- "this doesn't not have an apostrophe,.!@#$%^&*()"
gsub("[[:punct:,^\\']]"," ", str2 )
# [1] "this doesn't not have an apostrophe,.!@#$%^&*()"

What am I doing wrong?

like image 402
screechOwl Avatar asked Mar 06 '13 18:03

screechOwl


2 Answers

A "negative lookahead assertion" can be used to remove from consideration any apostrophes, before they are even tested for being punctuation characters.

gsub("(?!')[[:punct:]]", "", str2, perl=TRUE)
# [1] "this doesn't not have an apostrophe"
like image 87
Josh O'Brien Avatar answered Oct 02 '22 20:10

Josh O'Brien


I am not sure if you can specify all punctuations except ' within a regexp the way you've done. I would check for alphanumerics + ' + space with negation:

gsub("[^'[:lower:] ]", "", str2) # per Joshua's comment
# [1] "this doesn't not have an apostrophe"
like image 22
Arun Avatar answered Oct 02 '22 18:10

Arun